Lab install

#library('DATA606')
#getLabs()
startLab('Lab1') 

Lab 1 on your own start

setwd("~/GitHub/MSDA_JM/DATA606/Week1/Lab1")
source("more/cdc.R")

On Your Own

plot(cdc$weight ~ cdc$wtdesire)

ANSWER: The relationship can be described a LINEAR

  • Let’s consider a new variable: the difference between desired weight (wtdesire) and current weight (weight). Create this new variable by subtracting the two columns in the data frame and assigning them to a new object called wdiff.
wdiff <- cdc$wtdesire - cdc$weight
summary(wdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -300.00  -21.00  -10.00  -14.59    0.00  500.00
  • What type of data is wdiff? If an observation wdiff is 0, what does this mean about the person’s weight and desired weight. What if wdiff is positive or negative?

ANSWER : wdiff is a NUMERICAL and DISCRETE data . If wdiff = 0 then person is AT his desired weight.

If wdiff is positive it means the number of pounds he has to GAIN, else if it is negative it means number of pounds he has to LOSE.

  • Describe the distribution of wdiff in terms of its center, shape, and spread, including any plots you use. What does this tell us about how people feel about their current weight?
summary(wdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -300.00  -21.00  -10.00  -14.59    0.00  500.00
range(wdiff)
## [1] -300  500
hist(wdiff, breaks=100)

ANSWER: The desired weight loss is skewed towards left side of 0. This means majority of people want to lose weight.

  • Using numerical summaries and a side-by-side box plot, determine if men tend to view their weight differently than women.

    wdiff_men <- subset(cdc$wtdesire - cdc$weight, cdc$gender=="m")
    wdiff_women <- subset(cdc$wtdesire - cdc$weight, cdc$gender=="f")
    summary(wdiff_men)
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ## -300.00  -20.00   -5.00  -10.71    0.00  500.00
    summary(wdiff_women)
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ## -300.00  -27.00  -10.00  -18.15    0.00   83.00
    boxplot(wdiff_men, wdiff_women)

ANSWER: The women’s mean and plot compared to men’s implies women want to lose more weight than men

  • Now it’s time to get creative. Find the mean and standard deviation of weight and determine what proportion of the weights are within one standard deviation of the mean.
mean <- mean(cdc$weight)
sd <- sd(cdc$weight)
within_1sd <- subset(cdc, cdc$weight >= mean-sd & cdc$weight <= mean+sd)
nrow(within_1sd)/nrow(cdc)
## [1] 0.7076