Cases: 2000 Variables: 9 genhealth: categorial, ordinal exerany: categorical, nominal hlthplan: categorical, nominal smoke100: categorical, nominal height: numerical, discrete weight: numerical, discrete age: numerical, discrete gender: categorical, nominal
height
and age
, and compute the interquartile range for each. Compute the relative frequency distribution for gender
and exerany
. How many males are in the sample? What proportion of the sample reports being in excellent health?source("C:/Users/Andrew/Documents/R/win-library/3.1/IS606/labs/Lab1/more/cdc.R")
summary(cdc$height)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 48.00 64.00 67.00 67.18 70.00 93.00
70-64
## [1] 6
summary(cdc$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 31.00 43.00 45.07 57.00 99.00
57-31
## [1] 26
table(cdc$gender)/20000
##
## m f
## 0.47845 0.52155
table(cdc$exerany)/20000
##
## 0 1
## 0.2543 0.7457
table(cdc$gender)
##
## m f
## 9569 10431
#males = 9569
table(cdc$genhlt)/20000
##
## excellent very good good fair poor
## 0.23285 0.34860 0.28375 0.10095 0.03385
#excellent = .23285
More men have reported to smoke at least 100 cigarettes.
under23_and_smoke
that contains all observations of respondents under the age of 23 that have smoked 100 cigarettes in their lifetime. Write the command you used to create the new object as the answer to this exercise.under23_and_smoke <- subset(cdc, age < 23 & smoke100 == 1)
summary(under23_and_smoke)
## genhlth exerany hlthplan smoke100
## excellent:110 Min. :0.0000 Min. :0.0000 Min. :1
## very good:244 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:1
## good :204 Median :1.0000 Median :1.0000 Median :1
## fair : 53 Mean :0.8145 Mean :0.6952 Mean :1
## poor : 9 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1
## Max. :1.0000 Max. :1.0000 Max. :1
## height weight wtdesire age gender
## Min. :59.00 Min. : 85.0 Min. : 80.0 Min. :18.00 m:305
## 1st Qu.:65.00 1st Qu.:130.0 1st Qu.:125.0 1st Qu.:19.00 f:315
## Median :68.00 Median :155.0 Median :150.0 Median :20.00
## Mean :67.92 Mean :158.9 Mean :152.2 Mean :20.22
## 3rd Qu.:71.00 3rd Qu.:180.0 3rd Qu.:175.0 3rd Qu.:21.00
## Max. :79.00 Max. :350.0 Max. :315.0 Max. :22.00
It shows boxplots of bmi for each self-reported general health bin. This figure suggests that people who have higher bmi’s are more likely to report worse general health as well.
exerany, or exercised in past month, is likely associated with better health, and lower bmi, since they are more likely to burn calories and weigh less. As the boxplot shows, those who have exercised in the past month have a slightly lower median bmi and a thinner iqr, although there are still many outliers.
bmi <- (cdc$weight / cdc$height^2) * 703
boxplot(bmi ~ cdc$exerany)
plot(cdc$weight ~ cdc$wtdesire)
The general relationship looks to have a slope above 1, suggesting that people generally want to lose some weight
wtdesire
) and current weight (weight
). Create this new variable by subtracting the two columns in the data frame and assigning them to a new object called wdiff
.wdiff <- cdc$wtdesire - cdc$weight
wdiff
? If an observation wdiff
is 0, what does this mean about the person’s weight and desired weight. What if wdiff
is positive or negative?wdiff is numerical and discrete
If an observation is 0, then the respondent is satisfied with their current weight
If wdiff is negative, than they want to lose weight, if it is positive, they want to gain weight
wdiff
in terms of its center, shape, and spread, including any plots you use. What does this tell us about how people feel about their current weight?boxplot(wdiff)
hist(wdiff, breaks = 40)
plot(wdiff)
summary(wdiff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -300.00 -21.00 -10.00 -14.59 0.00 500.00
Wdiff median = -10, mean = -14.59, so it’s typical for people to want to lose around 10 to 15 pounds
The Wdiff histogram is unimodal with a slight left skew, so there are some people who want to lose a lot of weight, and few people who want to gain weight
The iqr spread is between 0 and -21 pounds, although there are many outliers, mostly of people who want to lose weight
genwdiff <- data.frame(wdiff, cdc$gender)
summary(subset(genwdiff, cdc.gender == "m"))
## wdiff cdc.gender
## Min. :-300.00 m:9569
## 1st Qu.: -20.00 f: 0
## Median : -5.00
## Mean : -10.71
## 3rd Qu.: 0.00
## Max. : 500.00
summary(subset(genwdiff, cdc.gender == "f"))
## wdiff cdc.gender
## Min. :-300.00 m: 0
## 1st Qu.: -27.00 f:10431
## Median : -10.00
## Mean : -18.15
## 3rd Qu.: 0.00
## Max. : 83.00
boxplot(genwdiff$wdiff ~ genwdiff$cdc.gender)
Women (median = -10) generally appear to want to lose a few more pounds than men (median = -5), and women have a slightly larger range of how much they want to lose/gain (iqr = 27) than men (iqr = 20). Interestingly, more men than women appear to want to gain weight.
weight
and determine what proportion of the weights are within one standard deviation of the mean.avgwt <- mean(cdc$weight)
sdwt <- sd(cdc$weight)
instdev <- subset(cdc, weight < (avgwt + sdwt) & weight > (avgwt - sdwt))
dim(instdev)[1]/dim(cdc)[1]
## [1] 0.7076
mean of weight = 169.7
standard deviation = 40.08
proportion within one standard deviation of the mean = .7076