Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain. There is a positive correlation in that as the calories increase the amount of carbohydrates increases.
In this scenario, what are the explanatory and response variables? Calories is the explanatory variable and carbs is the response variable.
Why might we want to fit a regression line to these data? To allow us to more easily view the level of correlation and to apply a visible indicator of the potential for the amount of carbohydrates to be determined based on the amount of calories.
Do these data meet the conditions required for fitting a least squares line? The data appears to meet the requirements of a linear relationship, near normal residuals distribution, and independence.
It does not appear to have a consistent variance around the line because as calories increase the carbs data points are drifting further away from the regression line (appears to be heterorschedastic not homoschedastic to me)
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals.19 The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\begin{center} \end{center
Describe the relationship between shoulder girth and height. As shoulder girth increases, height also tends to increase so there is a positive correlation between the two.
How would the relationship change if shoulder girth was measured in inches while the units of height remained in centimeters? One inch > one cm, so shoulder girth values would fall, while height remained the same. The result would be the appearance of a lower correlation.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
hmean = 171.14
hsd = 9.41
gmean = 107.20
gsd = 10.37
R = 0.67
slope = (hsd/gsd) * R
cat('b1 is:', slope, '\n')
## b1 is: 0.6079749
#height = b0 + (b1 * gmean)
#hmean = b0 + (slope * gmean)
b0 = hmean - (slope * gmean)
cat('b0 is: ', b0)
## b0 is: 105.9651
The equation for the regression line is: height = 106 + 0.608(girth)
Interpret the slope and the intercept in this context. Since the slope’s value of 0.6 is greater than zero it supports the prior statement that height and girth are positively correlated. Regarding the intercept if it were possible for girth to be 0 cm, then the height would be 106 cm. Not of practical value for this study.
Calculate \(R^2\) of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.
Rsquared <- round(R**2,2)
cat('Rsquared is', Rsquared)
## Rsquared is 0.45
The model explains 45% of the variation between height and shoulder girth
#height = 106 + 0.608(girth)
pred_height = 106 + (0.608 * 100)
cat('The students predicted height is:', pred_height, 'cm')
## The students predicted height is: 166.8 cm
student_residual <- pred_height - 160
cat('Residual error is', student_residual, 'cm')
## Residual error is 6.8 cm
This residual error of 6.8 cm’s means that the model’s prediction was almost 7 cm’s higher than the student’s actual height. If this student were plotted his actual height indicator would appear below the regression line.
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
\begin{center} \end{center}
heart_weight = -0.357 + (4.034 * body_weight)
Interpret the intercept. If a cat had a body weight of 0, the heart’s weight would be -0.357. For this case the intercept does not offer any practical value.
Interpret the slope. For every 4 kg’s increase in body weight, the heart weight is predicted to increase 1 kg
Interpret \(R^2\). The R2 of 64.66% means that this is the amount of variation in the response that is explained by the least squares line.
Calculate the correlation coefficient. Square root of 0.6466 = 0.8041
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
\begin{center} \end{center}
# t = (estimate - 0)/SE
t = 4.13
SE = 0.0322
estimate = t * SE
cat('the slope is:', estimate)
## the slope is: 0.132986
Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning. Yes. When beauty is a 0, the evaluation will around 4, hence as beauty increases the evaluation will also increase.
List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots. Linear relationship: based on the plot and the table values there appears to be a linear relationship
Error residuals normally distributed: the residuals plot show a consistent distribution and the histogram is nearly normal
Constant variability: there appears to be constant variability around the regression line
Independence: it appears that beauty and evaluation are independent variables