Load the CDC dataset:
source("http://www.openintro.org/stat/data/cdc.R")
EXERCISE 1.
dim(cdc)
## [1] 20000 9
EXERCISE 2.
summary(cdc$height)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 48.00 64.00 67.00 67.18 70.00 93.00
summary(cdc$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 31.00 43.00 45.07 57.00 99.00
57-31
## [1] 26
EXERCISE 3.
table(cdc$gender)
##
## m f
## 9569 10431
table(cdc$genhlth)/20000
##
## excellent very good good fair poor
## 0.23285 0.34860 0.28375 0.10095 0.03385
What percent report being in excellent health? 23.285%
Create a mosaic plot to see the smoking habits based on gender.
mosaicplot(table(cdc$gender,cdc$smoke100))
f. What does the mosaic plot reveal about smoking habits and gender?
Males smoke more than females
EXERCISE 4.
under45_and_smoke <- (cdc$age<45 / cdc$smoke100)
Based on looking at the first rows of under45_and_smoke, how do you know that your object is correct (that you used the right command in step a.)? There are 6 people that are 45 and under and there were 6 results
Obtain the frequency distribution of the age of those who have smoked 100 cigarettes or more. 18 19 20 21 22 23 24 25 26 92 110 127 154 137 151 158 125 141 27 28 29 30 31 32 33 34 35 150 166 158 170 152 142 159 156 201 36 37 38 39 40 41 42 43 44 187 198 186 202 237 186 226 254 192 45 46 47 48 49 50 51 52 53 230 186 202 195 178 201 128 198 181 54 55 56 57 58 59 60 61 62 132 169 132 160 148 135 121 113 126 63 64 65 66 67 68 69 70 71 107 131 152 95 108 123 106 111 112 72 73 74 75 76 77 78 79 80 110 96 104 88 83 65 67 64 62 81 82 83 84 85 86 87 88 89 35 40 36 25 26 7 9 6 5 90 91 92 93 96 97 99 Inf 1 6 4 2 1 1 1 10559
Obtain a barplot of the frequency distribution. (put the smoking variable first and then age within the table for the barplot command to get a nice plot)
barplot(table(cdc$smoke100, cdc$age))
EXERCISE 5.
boxplot(cdc$height ~ cdc$gender)
summary(cdc$height,cdc$gender == "m")
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 48.00 64.00 67.00 67.18 70.00 93.00
How do the summary statistics for males compare to what you see in the boxplot? They look accurate to what the boxplot shows
Find the bmi associated with the data and create the boxplot comparing it based on general health.
bmi <- (cdc$weight / cdc$height^2) * 703
boxplot(bmi ~ cdc$genhlth)
e. What does this box plot of bmi against general health show? The
higher the BMI, the poorer the health
boxplot(bmi ~ cdc$age)
g. List the variable you chose, why you might think it would have a
relationship to BMI, and indicate what the figure seems to suggest. I
chose age as the variable, I thought it might show BMI rangers higher in
the middle age range as metabolism slows down. The figure seems to
suggest that I was generally correct, the higher BMI range seemes to be
mid 30’s to 58.