Who would like to lose weight? How much? The Behavioral Risk Factor Surveillance System (BRFSS) asked men and women what they thought their ideal and current weights were. Is there a difference between men and women in how they picture their ideal weights? What about for heavier vs. lighter people?
library(ggplot2)
cdc_data<-source("more/cdc.R")
cdcStudy<-data.frame(cdc_data$value)
names(cdcStudy)
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
## [7] "wtdesire" "age" "gender"
head(cdcStudy)
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1 good 0 1 0 70 175 175 77 m
## 2 good 0 1 1 64 125 115 33 f
## 3 good 1 1 1 60 105 105 49 f
## 4 good 1 1 0 66 132 124 42 f
## 5 very good 0 1 0 61 150 130 55 f
## 6 very good 1 1 0 64 114 114 55 f
cdcStudy$ratio<-(cdcStudy$weight/cdcStudy$wtdesire)
The following graph compares respondents’ desired weight to their current weights. Respondents along the line are at their desired weights. Darker points indicate respondents who would like to lose more weight, as a percentage.
ggplot(cdcStudy,aes(x=weight, y=wtdesire))+geom_point(aes(col=ratio))+
scale_colour_gradient2()+xlim(0,400)+ylim(0,400)+ theme(legend.position="none")+geom_abline(slope=1, intercept=0)

ggplot(cdcStudy,aes(x=weight, y=wtdesire))+geom_point(aes(col=ratio))+
scale_colour_gradient2()+geom_abline(slope=1, intercept=0)+facet_grid(gender~.)+xlim(0,400)+ylim(0,400)+ theme(legend.position="none")

cdcStudy$wdiff<-(cdcStudy$wtdesire-cdcStudy$weight)
The distribution of wdiff is actually quite leptokurtic. While many people seem to feel that their weight is more than ideal, more than half feel their ideal weight is within 20 pounds of their current weight. This can be seen in the histogram and summary statistics above. The median of wdiff id -10.00 and the first and third quartiles are at -21.00 and 0.00. The mean, at -14.59, is less than the median.
Men tend to be more likely to believe their weight is close to ideal. For men, their ideal is 10.70613 pounds. Women feel they are 18.15118 pounds away from their current weight. In the final graph, it is clear that women, at all weights tend to believe they are more overweight than men do.
CDCGrouped <-aggregate(wdiff ~ gender, cdcStudy, mean)
CDCGrouped
## gender wdiff
## 1 m -10.70613
## 2 f -18.15118
ggplot(cdcStudy,aes(x=gender,y=wdiff)) + geom_boxplot()

ggplot(cdcStudy,aes(x=weight, y=wdiff))+geom_point(aes(col=gender))+ scale_color_manual(values=c("#26f2d7", "#0c49f2")) +xlim(0,550)+ylim(-350,150)

mean(cdcStudy$weight)
## [1] 169.683
sd(cdcStudy$weight)
## [1] 40.08097
maxint<-mean(cdcStudy$weight)+sd(cdcStudy$weight)
minint<-mean(cdcStudy$weight)-sd(cdcStudy$weight)
maxnumber<-subset(cdcStudy,weight<mean(cdcStudy$weight)+sd(cdcStudy$weight))
minnumber<-subset(cdcStudy,weight<mean(cdcStudy$weight)-sd(cdcStudy$weight))
70.76% of the weights are within 1 standard deviation of the mean. That’s close to 68%, the expected percentage for a standard normal.
The final analysis is whether someone is more likely to believe they need to lose weight if they weigh more. That would make sense and appears to be the case from graphical analysis. In the first graph, points are darker the further they are to the right. The final graph, of wdiff to weight appears to have a definite downward slope. The regression below confirms, at a .001 significance level, that there is a relationship. However, the coefficient of determination is less than .4, so this accounts for only a part of the difference in belief about personal ideal weight.
lmresults<-lm(cdcStudy$wdiff~cdcStudy$weight)
summary(lmresults)
##
## Call:
## lm(formula = cdcStudy$wdiff ~ cdcStudy$weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -167.98 -9.32 0.08 11.51 518.31
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.664015 0.590782 78.99 <2e-16 ***
## cdcStudy$weight -0.360986 0.003388 -106.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.21 on 19998 degrees of freedom
## Multiple R-squared: 0.3621, Adjusted R-squared: 0.362
## F-statistic: 1.135e+04 on 1 and 19998 DF, p-value: < 2.2e-16