Introduction to Linear Regression - Data 606 Chapter 7 Homework

Heather Geiger - April 22, 2018

Question 7.24

  1. There is generally a positive correlation between the number of calories and the amount of carbohydrates. However, the fit of the points to a straight line is not the greatest, e.g. the correlation is not super strong.
  2. Explanatory variable = calories. Response variable = amount of carbohydrates in grams.
  3. In general, one may want to use a regression line to predict an item of interest given limited information (only the explanatory variable). Often, one will find calories on a menu, but not necessarily full nutritional info. If one were counting carbs, it might be useful to get a sense of the carbohydrate content even if only calories were posted on a menu.
  4. No, these data do not meet conditions required for fitting a least squares line. One condition is constant variability, versus this data which gets more variable as x increases.

Question 7.26

sy = 9.41
sx = 10.37
R = 0.67
slope = (sy/sx) * R
#slope*(107.20) + intercept = 171.14
intercept = 171.14 - (slope*107.20)
round(slope,digits=3)
## [1] 0.608
round(intercept,digits=3)
## [1] 105.965
R^2
## [1] 0.4489
predicted = slope*100 + intercept
round(predicted,digits=2)
## [1] 166.76
actual = 160
round(actual - predicted,digits=2)
## [1] -6.76
  1. y = .608x + 105.965
  2. The slope means that for every cm of shoulder girth, height increases by .608cm. The intercept here would mean that someone with 0cm shoulder girth would still have a height of 105.965cm. Since this does not make sense (can’t have 0cm shoulders), we say that the purpose of the intercept here is to adjust the height of the line.
  3. R^2 here is .4489. This means that the regression line here explains 44.89% of the variance.
  4. The model predicts the height of the student as 166.76 cm.
  5. The residual here is -6.76cm. A negative residual means that the model overestimated the response variable (height).
  6. No, it would not be appropriate to use the linear model to predict the height of this child. The child’s shoulder girth falls too many standard deviations away from the mean, so we would be extrapolating if we applied the model.

Question 7.30

  1. y = 4.034x - 0.357
  2. The intercept would mean that a 0kg cat would have a -0.357g heart. Since having a 0kg cat does not make sense, the intercept here serves only to adjust the height of the line.
  3. The slope means that for every 1kg increase in the weight of a cat, we would expect that cat’s heart to weigh 4.034g more.
  4. R^2 here means that 64.66% of the variance is explained by our linear model.
  5. The correlation coefficient here is sqrt(.6466) = .804.

Question 7.40

  1. The slope may be obtained by multiplying the t-value by the standard error. 4.13*.0322 = 0.1330.
  2. Yes, these data do provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive. The t-value is very high, and the p-value is very low. This indicates a very low chance that this level of deviation from a slope of zero (no positive relationship) is due to chance.
  3. Yes, these data appear to meet the conditions required for linear regression. They appear to be linear (the data would not be better fit by a model with curves). The residuals are nearly normal. There is constant variability over the range of x (beauty). The observations are independent in that this is not time series data.