Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
Its is a positive, linear association between the number of calories and amount of carbohydrates i.e increase in calories corresponds to increase in carbs but variation in residuals increases as calories increase.
explanatory variables: calories response variables: carbs
Since we want to predict carbs for given number of calories using regression line.
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals.19 The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\begin{center}
\end{center}
Its is a positive, linear association between shoulder girth and height.
The relationship would become more linear and strong since the residuals become closer.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
\(\widehat{y}\) = \(\beta_0\) + \(\beta_1\) * x
\(\beta_1\) = \(\frac{S_y}{S_x}\) * R
s_y <- 9.41
s_x <- 10.37
r <- 0.67
y_mean <- 171.14
x_mean <- 107.20
beta1 <- (s_y / s_x ) *r
beta0 <- y_mean - (beta1*x_mean)
cat(beta0, beta1)
## 105.9651 0.6079749
Equation of regression line is
\(\widehat{y}\) = 105.9651 + 0.6079749*x
slope: For each increase in shoulder girth, model predicts 0.61cm additional in height. intercept: The intercept represents the height in cm when shoulder girth is 0cm.
r_square <- r^2
r_square
## [1] 0.4489
It shows that the linear model explains 44.89% variation of height data.
ht <- 105.9651 + 0.6079749*100
ht
## [1] 166.7626
res <- 160 - ht
res
## [1] -6.76259
A negative residual means our model overestimates the height data.
The original dataset has shoulder girth between 85cm to 135cm so this linear model would NOT be appropriate for value 56cm as it is out of range.
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
\begin{center} \end{center}
\(\widehat{y}\) = -0.357 + 4.034 * x
Here intercept is a negative value -0.357 which means heart weight will be -0.357 when weight is 0 kg.
Here is the slope is 4.034 which means heart weight increases by 4.034 grams for each 1kg increase in cats body weight.
It shows that this linear model explains 64.66% variation in heart weight.
R2 <- 0.6466
R <- sqrt(R2)
R
## [1] 0.8041144
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
\begin{center} \end{center}
Equation: \(\widehat{y}\) = \(\beta_0\) + \(\beta_1\) * x
slope <- (4.010 - 3.9983) / 0.0883
slope
## [1] 0.1325028
\(\beta_1\) = \(\frac{S_y}{S_x}\) * R
Here \(S_x\) and \(S_y\) are positive so the data shows slope is positive.
Linearity: The relationship between beauty and teaching evaluation seems linear from the scatter plot.
Nearly normal residuals: As shown in the residuals histogram, they are nearly normal.
Constant variability: The scatterplot of the residuals shows constant variability.
Independent observations: I assume data is independent in the sample of 463 professors.