setwd("C:\\Users\\26291\\Documents\\Data606_lab1")
source("http://www.openintro.org/stat/data/cdc.R")
names(cdc)
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
## [7] "wtdesire" "age" "gender"
plot(cdc$weight ~ cdc$wtdesire)
wdif <- (cdc$wtdesire - cdc$weight)
If wdiff is negative, than they want to lose weight, if it is positive, they want to gain weight.
boxplot(wdif)
hist(wdif, breaks = 40)
plot(wdif)
summary(wdif)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -300.00 -21.00 -10.00 -14.59 0.00 500.00
The Wdiff histogram is unimodal with a slight left skew, so there are some people who want to lose a lot of weight, and few people who want to gain weight
The iqr spread is between 0 and -21 pounds, although there are many outliers, mostly of people who want to lose weight
difwdif <- data.frame(wdif, cdc$gender)
summary(subset(difwdif, cdc.gender == "m"))
## wdif cdc.gender
## Min. :-300.00 m:9569
## 1st Qu.: -20.00 f: 0
## Median : -5.00
## Mean : -10.71
## 3rd Qu.: 0.00
## Max. : 500.00
summary(subset(difwdif, cdc.gender == "f"))
## wdif cdc.gender
## Min. :-300.00 m: 0
## 1st Qu.: -27.00 f:10431
## Median : -10.00
## Mean : -18.15
## 3rd Qu.: 0.00
## Max. : 83.00
boxplot(difwdif$wdif ~ difwdif$cdc.gender)
avg_wt <- mean(cdc$weight)
avg_wt
## [1] 169.683
sd_wt <- sd(cdc$weight)
sd_wt
## [1] 40.08097
weight_within_one_sd <- subset(cdc, weight < (avg_wt + sd_wt) & weight > (avg_wt - sd_wt))
dim(weight_within_one_sd)
## [1] 14152 9
dim(weight_within_one_sd)[1]/dim(cdc)[1]
## [1] 0.7076
proportion of 14152/20000 = 0.7076, or about 71%. Thus 71% of weights are within 1 standard deviation of the mean.