Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
Answer
There is a direct, positive correlation between amount of carbohydrates and calories in a Starbucks food menu item.
Answer
The explanatory variable is the amount of calories in a menu item and the response variable is the amount of carbohydrates in the item. Because the question is regarding a prediction of how many carbohydrates are in an item, carbohydrates must be the response variable.
Answer
We may want to fit a regression line to these data in order to better predict how many carbohydrates would be in a menu item if we are given the calorie content.
Answer
Judging by the shape of the residuals plot, it does not look like the spread is randomly arranged around the x-axis. As the calories increase, so does the spread of the variance. The data does not meet the condition of a constant variance, so the least squares line should not be used.
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\begin{center} \end{center}
Answer
There is a direct, positive correlation between shoulder girth and height. As shoulder girth increases, height also increases in a relatively linear fashion.
Answer
If shoulder girth were measured in inches and height remained in centimeters, the slope of the plot would increase by a factor of 2.54. The positive correlation would remain, but it would appear much more extreme than reality suggests. This would cause problems for people attempting to interpret the plot. It would be a misrepresentation of the data.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
Answer
mean_g <- 107.2
stdev_g <- 10.37
mean_h <- 171.14
stdev_h <- 9.41
R <- 0.67
m <- (stdev_h/stdev_g) * R
paste("slope: ", round(m, 5))
## [1] "slope: 0.60797"
b <- mean_h - (m*mean_g)
paste("intercept: ", round(b, 5))
## [1] "intercept: 105.96509"
Using line y = mx + b, height = .60797*(girth) + 105.96509
Answer
The intercept indicates that the average height is at minimum 105.96509 when shoulder girth is equivalent to 0. In practice, this would not be realistic, so the intercept strictly means that the minimum height is 105.96509 cm. The slope indicates the relationship between the increase in girth and the increase in height. For every one cm increase in girth, height increases by 0.60797 cm.
Answer
R^2
## [1] 0.4489
\(R^2\) = 0.4489. In context, this means 44.89% of height variability is attributed to shoulder girth.
Answer
paste("Height: ", round(m*100 + b, 5))
## [1] "Height: 166.76258"
Answer
paste("Residual: ", round(160 - (m*100 + b),5) )
## [1] "Residual: -6.76258"
This residual means that the student it 6.76 cm shorter than predicted.
Answer
The lowest acceptable shoulder girth for this model is above 80 cm. 56 cm is much too small to obtain a trustworthy result based on the data.
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
\begin{center} \end{center}
Answer
stdev_B <- sd(cats$Bwt)
mean_B <- mean(cats$Bwt)
stdev_H <- sd(cats$Hwt)
mean_H <- mean(cats$Hwt)
covariance <- cov(cats$Hwt, cats$Bwt)
R <- covariance/(stdev_H*stdev_B)
m <- (stdev_H/stdev_B) * R
b <- mean_H - (m*mean_B)
paste("slope: ", round(m, 5))
## [1] "slope: 4.03406"
paste("intercept: ", round(b, 5))
## [1] "intercept: -0.35666"
\(Heart Weight = 4.03406*Body Weight - 0.35666\)
Answer
The intercept is the value that is to be subtracted from the body weight after applying the 4.03406 coefficient. It is a fixed value shift downward in the heart weight. It is not realistic to consider the intercept in the event that the body weight value is 0, because it is not possible to have a negative mass.
Answer
The slope of 4.03406 is the rate of change of the heart weight for each one gram increase in the body weight. As body weight increases, heart weight increases by about four times.
Answer
paste("R-squared: ", round(R^2, 5))
## [1] "R-squared: 0.64662"
The R-squared value of 0.64662 means that 64.662% of heart weight variability is attributed to body weight.
Answer
stdev_B <- sd(cats$Bwt)
stdev_H <- sd(cats$Hwt)
covariance <- cov(cats$Hwt, cats$Bwt)
R <- covariance/(stdev_H*stdev_B)
R
## [1] 0.8041274
The calculation for the correlation coefficient is above, but I have copied it to this section as well. Correlation Coefficient is equivalent to the covariance divided by the standard deviation of the x and y values.
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
\begin{center} \end{center}
Answer
m <- (3.9983 - 4.010)/(-.0883)
paste("Slope: ", round(m,5))
## [1] "Slope: 0.1325"
Answer
Yes, there is convincing evidence from the data to suggest that the slope of the relationship is positive. From the calculation, the slope is positive. The correlation coefficient is sufficiently different from 0, so the regression line can be used to verify that there is evidence of a correlated relationship.
Answer
Linearity: The distribution of the points appears mostly linear with very minor deviations, so a linear regression line would likely make sense. Normal/Nearly Normal Residuals: The histogram for the residuals are slightly left-skewed, but mostly near normal. Constant Variability: There is near constant variability of the residuals above and below the regression line. There is no obvious deviation from the line.