7.24 Nutrition at Starbucks, Part I. The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
  1. Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.

Menu items with higher calorie values contain more grams of carbohydrates. The slope is positive and on the order of .1 (approximately transects points 100,20 and 300,40) with an intercept around 18. It looks like the group of low-calorie / low-carb points in the lower left corner of the plot are exerting high leverage, which could mean the linear relationship is actually weak.

  1. In this scenario, what are the explanatory and response variables?

Calories are the explanatory variable, and carbohydrates are the response variable. We are using calories to predict carbohydrate content.

  1. Why might we want to fit a regression line to these data?

To visualize the relationship - does the amount of carbohydrates (including sugar / sweetener) increase for items with caloric content.

  1. Do these data meet the conditions required for fitting a least squares line? **The conditions are linearity, nearly normal residuals, constant variability, and independent observations.

Linearity: there may not be a strong linear relationship due to the influential points.

Nearly normal residuals: Residuals chart a unimodal distribution, though it is leftward skewed to negative residuals and may not be normal.

Constant variability: there is greater variability of residuals in emenu items above 350 calories - this is concerning and should be investigated.

Independent observations: there’s not logical depedence between menu items and nutritional content, and we must assume that time doesn’t factor.

I don’t believe the conditions of linearity and constant variability are met, and nearly normal residuals may also be at issue.


7.26 Body measurements, Part III. Exercise 7.15 introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
  1. Write the equation of the regression line for predicting height.
body.r <- .67
body.x.s <- 10.37 # standard deviation of shoulder girth on x-axis
body.y.s <- 9.41 # standard deviation of height on y-axis
body.x.bar <- 107.20 # mean shoulder girth
body.y.bar <- 171.14 # mean shoulder height

body.m <- (body.y.s / body.x.s) * body.r

body.b <- body.y.bar - body.m * body.x.bar
paste0("y = ", round(body.m, 4), " * x + ", round(body.b, 4))
## [1] "y = 0.608 * x + 105.9651"
  1. Interpret the slope and the intercept in this context.

For every additional centimer of height, we would expect shoulder growth to increase by about .61 cm. The intercept describes the average height if the student had not shoulder girth (a hypoethical state).

  1. Calculate R2 of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.
body.rsq <- body.r ^ 2
paste0("R^2 is ", body.rsq)
## [1] "R^2 is 0.4489"

About 45% of the variation in shoulder girth (the response variable) is explained by the least squares line we plotted above.

  1. A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.
body.x2 <- 100 # Shoulder girth of 100cm
body.yhat2 <- .6079749 * body.x2 + 105.965 
paste0("The predicted height is ", round(body.yhat2, 1))
## [1] "The predicted height is 166.8"
  1. The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.
body.y2 <- 160 # Actual height of 160cm
body.y.res2 <- body.y2 - body.yhat2
paste0("The residual is ", round(body.y.res2, 1))
## [1] "The residual is -6.8"

The residual is the difference between the observed value and the expected value. A negative residual means the that the model overestimated the height. In this case, based on randomly selected student’s 100 cm shoulder girth we would expect a height of 168cm. The residual of -6.8cm means this is 6.8 cm less than the student’s actual height.

  1. A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?
body.y3 <- 56 # 
body.yhat3 <- .6079749 * body.y3 + 105.965
paste0("The predicted height is ", round(body.yhat3, 1))
## [1] "The predicted height is 140"

The predicted height for that shoulder girth measurement is 49.8cm. As the data was collected from 507 physically active individuals, it is doubtful that one year-olds were included in the sample set. Accordingly, this value is likely outside of the range of the original data and this constitutes extrapolation, which should be avoided.


7.30 Cats, Part I. The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats. Estimate Std. Error t value Pr(>|t|)
  1. Write out the linear model.
cat.n <- 144 # Domestic cat sample
# cat.x  Body weight of cat
# cat.y  Heart weight of cat
cat.b <- -.356 # Intercept

cat.m <- 4.034

# cat.y <- cat.m * cat.x + cat.b
paste0("y = ", round(cat.m, 4), " * x + ", round(cat.b, 4))
## [1] "y = 4.034 * x + -0.356"
  1. Interpret the intercept.

The intercept of -.356g describes the predicted heart weight if a cat had a body weight of 0g (again, a hypothetical state.

  1. Interpret the slope.

For every additional 1000g of body weight, we would expect heart weight to increase by 4.034g.

  1. Interpret R2.
cat.r2 <- .6466

About 65% of the variation in heart weight (the response variable) is explained by body weight.

  1. Calculate the correlation coefficient.
cat.r <- sqrt(cat.r2)
paste0("The correlation coefficient is ", round(cat.r, 3))
## [1] "The correlation coefficient is 0.804"

The correlation coefficient describes the strength of the linear relationship between body weight and heart weight. A value of 1 indicates a perfect positive linear correlation, a value of -1 a perfect negative linear correlation. The calculated coefficient of .804 indicates a strong, positive relationship.