## [1] 20000
## [1] 9
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight" "wtdesire"
## [8] "age" "gender"
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1 good 0 1 0 70 175 175 77 m
## 2 good 0 1 1 64 125 115 33 f
## 3 good 1 1 1 60 105 105 49 f
## 4 good 1 1 0 66 132 124 42 f
## 5 very good 0 1 0 61 150 130 55 f
## 6 very good 1 1 0 64 114 114 55 f
## [1] "factor"
There are 20,000 people in the data set and there are 9 variables in it.
genhlth: categorical (multi-category)
exerany: categorical (binary)
hlthplan: categorical (binary)
smoke100: categorical (binary)
height: numeric
weight: numeric
wtdesire: numeric
age: numeric
gender: categorical (binary)
height
and
age
, and compute the interquartile range for each. Compute
the relative frequency distribution for gender
and
exerany
. How many males are in the sample? What proportion
of the sample reports being in excellent health?## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 48.00 64.00 67.00 67.18 70.00 93.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 31.00 43.00 45.07 57.00 99.00
## [1] 6
## [1] 26
##
## m f
## 9569 10431
##
## excellent very good good fair poor
## 0.23285 0.34860 0.28375 0.10095 0.03385
under23_and_smoke
that contains all observations of
respondents under the age of 23 that have smoked 100 cigarettes in their
lifetime. Write the command you used to create the new object as the
answer to this exercise.## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 13 excellent 1 0 1 66 185 220 21 m
## 37 very good 1 0 1 70 160 140 18 f
## 96 excellent 1 1 1 74 175 200 22 m
## 180 good 1 1 1 64 190 140 20 f
## 182 very good 1 1 1 62 92 92 21 f
## 240 very good 1 0 1 64 125 115 22 f
This shows that those with more favorable health ratings tend to have lower BMI’s as a whole but there is still a significant amount of overlap in BMI among the groups
I chose “exerany” thinking that whether or not a person exercised might be related to their BMI. The boxplot doesn’t tend to show dramatic differences.
There seems to be a positive mederatly strong linear relationship between weight and desired weight with a few outliers. A couple of people have very large desired weights.
wtdesire
) and current weight (weight
).
Create this new variable by subtracting the two columns in the data
frame and assigning them to a new object called wdiff
.wdiff
? If an observation
wdiff
is 0, what does this mean about the person’s weight
and desired weight. What if wdiff
is positive or
negative?wdiff
in terms of its
center, shape, and spread, including any plots you use. What does this
tell us about how people feel about their current weight?More people feel that they are overweight than feel that they are underweight. That is, more people desire to lose weight than to gain weight.
fdata = subset(cdc, cdc$gender == "f")
mdata = subset (cdc, cdc$gender == "m")
fwdiff = fdata$wtdesire=fdata$weight
mwdiff = mdata$wtdesire=mdata$weight
summary(fwdiff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 68.0 128.0 145.0 151.7 170.0 495.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 78.0 165.0 185.0 189.3 210.0 500.0
## [1] 0
## [1] 0
About 72% of women want to lose weight while 55% of men want to lose weight. The women who want to lose weight typically want to lose more weight than men do.
weight
and determine what proportion of the
weights are within one standard deviation of the mean.sd.1.range = mean(cdc$weight) + c(-1,1)*sd(cdc$weight)
middle.1.sd = subset(cdc,weight >= sd.1.range[1] & weight <= sd.1.range[2])
nrow(middle.1.sd)/20000
## [1] 0.7076
Team member | Attendance | Author | Contribution % |
---|---|---|---|
Diego Regules | Yes | Yes | 25% |
Gabriella Cardenas | Yes | No | 25% |
Cheyenne Korf | Yes | No | 25% |
Name of member 4 | Yes / No | Yes / No | 25% |
Total | 100% |