source("http://www.openintro.org/stat/data/cdc.R")
names(cdc)
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
## [7] "wtdesire" "age" "gender"
head(cdc, n=10)
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1 good 0 1 0 70 175 175 77 m
## 2 good 0 1 1 64 125 115 33 f
## 3 good 1 1 1 60 105 105 49 f
## 4 good 1 1 0 66 132 124 42 f
## 5 very good 0 1 0 61 150 130 55 f
## 6 very good 1 1 0 64 114 114 55 f
## 7 very good 1 1 0 71 194 185 31 m
## 8 very good 0 1 0 67 170 160 45 m
## 9 good 0 1 1 65 150 130 27 f
## 10 good 1 1 0 70 180 170 44 m
tail(cdc, n=10)
## genhlth exerany hlthplan smoke100 height weight wtdesire age
## 19991 excellent 1 1 0 71 195 190 43
## 19992 very good 1 1 1 72 210 175 52
## 19993 very good 1 1 0 71 180 180 36
## 19994 very good 0 1 1 63 165 120 31
## 19995 good 0 1 1 69 224 224 73
## 19996 good 1 1 0 66 215 140 23
## 19997 excellent 0 1 0 73 200 185 35
## 19998 poor 0 1 0 65 216 150 57
## 19999 good 1 1 0 67 165 165 81
## 20000 good 1 1 1 69 170 165 83
## gender
## 19991 m
## 19992 m
## 19993 m
## 19994 f
## 19995 m
## 19996 f
## 19997 m
## 19998 f
## 19999 f
## 20000 m
#Cases and Variables in the Dataset
dim(cdc)
## [1] 20000 9
genhlth: Categorical exerany: Numerical, Discrete hlthplan: Numerical, Discrete smoke100: Numerical, Discrete height: Numerical, Continuous weight: Numerical, Continuous wtdesire: Numerical, Continuous age: Numerical, Continuous gender: Categorical
summary(cdc$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 68.0 140.0 165.0 169.7 190.0 500.0
mean(cdc$weight)
## [1] 169.683
var(cdc$weight)
## [1] 1606.484
median(cdc$weight)
## [1] 165
table(cdc$smoke100)
##
## 0 1
## 10559 9441
table(cdc$smoke100)/20000
##
## 0 1
## 0.52795 0.47205
barplot(table(cdc$smoke100))
smoke <- table(cdc$smoke100)
barplot(smoke)
2. Create a numerical summary for
height and age, and compute the interquartile range for each. Compute the relative frequency distribution for gender and exerany. How many males are in the sample? What proportion of the sample reports being in excellent health?
height interquartile: 70-64 = 6 weight interquartile: 57-31 = 26 males = 9569 Excellent Health: 4657
summary(cdc$height)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 48.00 64.00 67.00 67.18 70.00 93.00
mean(cdc$height)
## [1] 67.1829
var(cdc$height)
## [1] 17.0235
median(cdc$height)
## [1] 67
summary(cdc$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 31.00 43.00 45.07 57.00 99.00
mean(cdc$age)
## [1] 45.06825
var(cdc$age)
## [1] 295.5886
median(cdc$age)
## [1] 43
summary(cdc$gender)
## m f
## 9569 10431
summary(cdc$exerany)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.7457 1.0000 1.0000
summary(cdc$genhlth)
## excellent very good good fair poor
## 4657 6972 5675 2019 677
table(cdc$gender,cdc$smoke100)
##
## 0 1
## m 4547 5022
## f 6012 4419
mosaicplot(table(cdc$gender,cdc$smoke100))
3. What does the mosaic plot reveal about smoking habits and gender?
More males smoke than females
cdc[567,6]
## [1] 160
bmi <- (cdc$weight / cdc$height^2) * 703
boxplot(bmi ~ cdc$genhlth)
Box plot shows that people with lower BMIs tend to find themselves in better health.
bmi <- (cdc$weight / cdc$height^2) * 703
boxplot(bmi ~ cdc$smoke100)
People who dont smoke are typically in a lower BMI range
plot(cdc$weight ~ cdc$wtdesire)
Relationship is generally upward sloping, so people in general desire to be in their same weight. Some people wanted to be significantly heavier or lighter, but not most.
wtdesire) and current weight (weight). Create this new variable by subtracting the two columns in the data frame and assigning them to a new object called wdiff.wtdiff <- c(cdc$weight-cdc$wtdesire)
plot(wtdiff)
wdiff? If an observation wdiff is 0, what does this mean about the person’s weight and desired weight. What if wdiff is positive or negative?If it is positive, more people want to gain weight, while if it is negative, more people want to lose.
wdiff in terms of its center, shape, and spread, including any plots you use. What does this tell us about how people feel about their current weight?Most people seem comfortable about their current range and do not want to stray too far away.