On Your Own

source("http://www.openintro.org/stat/data/cdc.R")

Make a scatterplot of weight versus desired weight. Describe the relationship between these two variables.

plot(cdc$weight, cdc$wtdesire)

There is a common trend that shows typically the desired weight is lower than the actual weight. The differences at the lower end seem to be closer together.

Let’s consider a new variable: the difference between desired weight (wtdesire) and current weight (weight). Create this new variable by subtracting the two columns in the data frame and assigning them to a new object called wdiff.

wdiff <- cdc$wtdesire - cdc$weight

What type of data is wdiff? If an observation wdiff is 0, what does this mean about the person’s weight and desired weight. What if wdiff is positive or negative?

wdiff is automatically set to int as there are both positive and negative numbers. If a wdiff = 0, that means their weight they desire to be at is what they are currently at. Satisfied by their weight. If wdiff is negative, they want to lose weight, and if wdiff is positive they want to gain weight.

Describe the distribution of wdiff in terms of its center, shape, and spread, including any plots you use. What does this tell us about how people feel about their current weight?

summary(wdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -300.00  -21.00  -10.00  -14.59    0.00  500.00
boxplot(wdiff)

The plot shows that the vast majority of individuals are looking for small changes as it is concentrated in between the IQR of 21. 75% of individuals are in the desire to lose weight as the Q3 is 0.

Using numerical summaries and a side-by-side box plot, determine if men tend to view their weight differently than women.

cdc2 <- cbind(cdc,wdiff)
cdc2male <- subset(cdc2, cdc2$gender == "m")
cdc2female <- subset(cdc2, cdc2$gender == "f")
summary(cdc2male$wdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -300.00  -20.00   -5.00  -10.71    0.00  500.00
summary(cdc2female$wdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -300.00  -27.00  -10.00  -18.15    0.00   83.00
boxplot(wdiff ~ cdc$gender)

Now it’s time to get creative. Find the mean and standard deviation of weight and determine what proportion of the weights are within one standard deviation of the mean.

cdcmean <- mean(cdc2$weight)
cdcsd<- sd(cdc2$weight)
cdclower <- cdcmean - cdcsd
cdcupper <- cdcmean + cdcsd
cdc1sd <- cdc2$weight > cdclower & cdc2$weight < cdcupper
cdc2sd <- cbind(cdc2,cdc1sd)
summary(cdc2sd$cdc1sd)
##    Mode   FALSE    TRUE 
## logical    5848   14152