height
and age
, and compute the interquartile range for each. Compute the relative frequency distribution for gender
and exerany
. How many males are in the sample? What proportion of the sample reports being in excellent health?## genhlth exerany hlthplan smoke100
## excellent:4657 Min. :0.0000 Min. :0.0000 Min. :0.0000
## very good:6972 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## good :5675 Median :1.0000 Median :1.0000 Median :0.0000
## fair :2019 Mean :0.7457 Mean :0.8738 Mean :0.4721
## poor : 677 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## height weight wtdesire age gender
## Min. :48.00 Min. : 68.0 Min. : 68.0 Min. :18.00 m: 9569
## 1st Qu.:64.00 1st Qu.:140.0 1st Qu.:130.0 1st Qu.:31.00 f:10431
## Median :67.00 Median :165.0 Median :150.0 Median :43.00
## Mean :67.18 Mean :169.7 Mean :155.1 Mean :45.07
## 3rd Qu.:70.00 3rd Qu.:190.0 3rd Qu.:175.0 3rd Qu.:57.00
## Max. :93.00 Max. :500.0 Max. :680.0 Max. :99.00
While the question asked for summaries of twos specific variables, with the limited number of variables it’s quicker to just get the summary of all (which also allows the analyst to eyeball things like desired weight being about 15 lbs lower than actual weight) and how close to a normal curve the data have.
## The IQR for the height variable in cdc is 6
## The IQR for the age variable in cdc is 26
gen = cdc$gender
genfreq = table(gen)
gen.relfreq = genfreq / nrow(cdc)
cbind(gen.relfreq)
## gen.relfreq
## m 0.47845
## f 0.52155
There are 9,569 men in the data, representing 48% of respondants.
exercise = cdc$exerany
exercisefreq = table(exercise)
exercise.relfreq = exercisefreq / nrow(cdc)
cbind(exercise.relfreq)
## exercise.relfreq
## 0 0.2543
## 1 0.7457
genhlth = cdc$genhlth
genhlthfreq = table(genhlth)
genhlth.relfreq = genhlthfreq / nrow(cdc)
cbind(genhlth.relfreq)
## genhlth.relfreq
## excellent 0.23285
## very good 0.34860
## good 0.28375
## fair 0.10095
## poor 0.03385
23% of respondants report that they are in excellent health