7.24 Nutrition at Starbucks, Part I.
- Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.
There appears to be a moderate, positive linear relationship between carbs and caloaries. There also seems to be a skew toward high calories.
- In this scenario, what are the explanatory and response variables?
Calaories -> Exlpanatory; and Carbs -> Response
- Why might we want to fit a regression line to these data?
We may want to see how well we can predict carbs given calaorie data.
- Do these data meet the conditions required for fitting a least squares line?
- Linearity: -there appears to be a weak to moderate linear relationship (check)
- Nearly nofmal residuals: The residuals histogram is somewhat nearly normal (weak check)
- Constant variability: The data appears to have constant variability. (check)
- Independent observations: The food items are indepenent of one another. (check)
Given this information I would say conditions have been borderline met.
7.26 Body measurements, Part III.
- Write the equation of the regression line for predicting height.
x <- 107.2
y <- 171.14
sx <- 10.37
sy <- 9.41
R <- 0.67
b1 = (sy/sx)* R
b1
## [1] 0.6079749
## [1] 105.9651
## [1] 0.4489
The equation is: height_hat = 105.9650878 + 0.6079749 * shoulder_girth
- Interpret the slope and the intercept in this context.
The slope means for every unit change in the explanatory variable the response variable will change by the slope. In this case for every centimeter increase to shoulder_girth, height will increase by 0.6079749.
The intercept, 105.9650878, is where the regression line crosses the y-axix alternatively, the value of y when x is 0. The intercept doens not alway have meaning if the value of x = 0 does not exist in the data (i.e people with shoulder girth of zero is the empty set)
- Calculate R2 of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.
From part a above, R_squared = 0.4489.
- A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.
The predicted height with be 105.9651 + (0.607 * 100) = 166.7625805cm.
- The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.
The residual ei equals (yi - y_hat) where y1=160 therefore ei = 160 - 166.7626 = -6.7626. The residual is difference between the predicted and actual height. The fact that the residual is negative indicates the equation overestimated the height in this case.
- A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?
No, best practice is to only use the regression model for data that falls within the data used to create the regression model. A girth of 56cm falls outside of the value of x used to create this model ( 85cm - 135cm).
7.30 Cats, Part I.
- Write out the linear model.
y_hat = -0.357 + 4.034 * x, where x = body weight and y = heart weight
- Interpret the intercept.
The intercept would imply that heart weight would be -0.357 when body weight is 0. However, this has little value of meaning because an x value of x does not really exisit and falls outside the range of x value used to create the regression model.
- Interpret the slope.
The slope indicates for every 1 kg increase in cat weight the weight of the cat heart will increase 4.034 grams.
- Interpret R2.
The R-Squared of 64.66 suggests that the regression explains 64.6% of the variance.
- Calculate the correlation coecient
The correlation coefficient is equal to the square root of R-Squared = Sqrt(0.6466) = 0.8041144.
7.40 Rate my professor.
- Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.
# Given b1 = (y-b0) / x
b0 <- 4.010
x <- -0.0883
y <- 3.9983
b1 <- (y-b0) / x
The slope, b1, equals: 0.1325028.
- Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.
Yes, the slope is positive and we see from the table a p-value of zero therefore we would reject the null hypothesis of
H0 b1 = 0 in favor of the alternative HA: b1 >0.
- List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.
- Linearity: a slope value of 4.010 supports a linear relationship (check)
- Nearly nofmal residuals: The residual histogram and qq plot (with some skew) support this (weak check)
- Constant variability: Scatter plots shows constant variability with a few significant outliers (weak check)
- Independent observations: Assuming there was no collusion amongst student we should have independce. (check)
The skew in the qq plot and residual outliers give me pause but I would say conditions are met.