Nutrition at Starbucks, Part I. (8.22, p. 326)

The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

  1. Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.
  • There is a positive correlation between the number of calories and amount of carbohydrates (in grams), but not so strong from the dispersal.
  1. In this scenario, what are the explanatory and response variables?
  • Calories is the explanatory variable and the amount of carbohydrates (in grams) is the response variable.
  1. Why might we want to fit a regression line to these data?
  • We might want to predict the amount of carbohydrates (in grams) from the number of calories while we only have the information of calories.
  1. Do these data meet the conditions required for fitting a least squares line?
  • No. According to graph#2, the absolute value of residuals increases while calories increase. It does not meet the constant variability condition.

-——————————————————————————-

Body measurements, Part I. (8.13, p. 316)

Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.

\begin{center} \end{center}

  1. Describe the relationship between shoulder girth and height.
  • The graph shows a positive linear relationship between them.
  1. How would the relationship change if shoulder girth was measured in inches while the units of height remained in centimeters?
  • It would be the best for us to keep them both in same unit, either in cm or in. If the shoulder girth was in inches, the length of the graph would be shorten as the inch value would be the cm value divided by 2.54 times, which makes the data points more condensed, but not affecting their linear relationship.

-——————————————————————————-

Body measurements, Part III. (8.24, p. 326)

Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.

  1. Write the equation of the regression line for predicting height.
  • The equation is y = 0.608x + 105.97, while x stands for shoulder girth and y stands for height.
## [1] 0.6079749
## [1] 105.9651
  1. Interpret the slope and the intercept in this context.
  • The slope means for every 1cm increase in shoulder girth, height would increase by 0.608cm.

  • The intercept means when should girth is 0cm, the height is 105.97cm, which is impossible to happen in real word. It is an adjustment on prediction of height when we get hold of a piece of shoulder girth information.

  1. Calculate \(R^2\) of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.
  • R\(^2\) is 0.4489. It means the regression line accounts for 44.89% of the variance height predicted from shoulder girth.
## [1] 0.4489
  1. A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.
  • The predicted height of a student having shoulder girth 100cm is around 166.76cm.
## [1] 166.7626
  1. The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.
  • The residual is -6.7626. It means we have overestimated the height of that specific student by 6.7626cm.
## [1] -6.7626
  1. A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?
  • No. The shoulder girth 56cm falls near 5 standard deviations away from the mean. The value of his/her shoulder girth goes too far beyond the original observation range, it is not appropriate for us to use predict the height of this child as it would be an extrapolation.

-——————————————————————————-

Cats, Part I. (8.26, p. 327)

The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.

\begin{center} \end{center}

  1. Write out the linear model.
  • The linear model is y=4.0341x -0.3567, while y stands for heart weight (in g) and x stands for body weight (in kg).
## 
## Call:
## lm(formula = cats$Hwt ~ cats$Bwt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5694 -0.9634 -0.0921  1.0426  5.1238 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.3567     0.6923  -0.515    0.607    
## cats$Bwt      4.0341     0.2503  16.119   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.452 on 142 degrees of freedom
## Multiple R-squared:  0.6466, Adjusted R-squared:  0.6441 
## F-statistic: 259.8 on 1 and 142 DF,  p-value: < 2.2e-16
  1. Interpret the intercept.
  • The intercept -0.3567 means that a cat with body weight 0 kg would have a -0.3567g heart, which does not make sense. The intercept is only an adjustment to heart weight (in g) while predicting a cat’s heart weight from its body weight (in kg).
  1. Interpret the slope.
  • The slope means for every 1kg increase in the body weight of a cat, its heart weight would increase 4.0341g.
  1. Interpret \(R^2\).
  • R\(^2\) is 0.6466209. It means that this linear model accounts for 64.66% of the variance heart weight (in g) predicted from body weight (in kg).
## [1] 0.6466209
  1. Calculate the correlation coefficient.
  • The correlation coefficient R is (positive) 0.8041274.
## [1] 0.8041274
## [1] 0.8041274

-——————————————————————————-

Rate my professor. (8.44, p. 340)

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.

\begin{center}

\end{center}

  1. Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.
  • The slope is 0.133.
## 
## Call:
## lm(formula = eval ~ beauty)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.80015 -0.36304  0.07254  0.40207  1.10373 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.01002    0.02551 157.205  < 2e-16 ***
## beauty       0.13300    0.03218   4.133 4.25e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5455 on 461 degrees of freedom
## Multiple R-squared:  0.03574,    Adjusted R-squared:  0.03364 
## F-statistic: 17.08 on 1 and 461 DF,  p-value: 4.247e-05
## [1] 0.1325028
  1. Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.
  • Yes. From the summary of the model, the p-value of beauty score is 0.0000425, which is very small. It indicates that the two scores is in a linear relationship.
  1. List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.

  • 1: Linearity

The residuals in the scatter plot dispersed randomly around y=0 without showing any obvious shapes or patterns, which shows strong linearity between beauty score and teaching evaluation score. Therefore, linearity condition appears to be met.

  • 2: Nearly normal residuals

The histogram is unimodal and nearly normal with no outliers. Also, the normal probability plot lies close to the normal line. Therefore, the nearly normal residuals condition appears to be met.

  • 3: Constant variability

From the scatter plot in part (1), the variation of the residuals is relatively small. Although some points lies below y=-1.5, most of the points are lying between y=(-1.5,1.5). Therefore, the constant variability condition appears to be met.

  • 4. Independence of errors

From the 4th graph, the residual points are randomly dispersed. It shows that the deviation of errors are independent of the time of data collection, independent of the errors collected earlier, which means it is not a time series dataset. Therefore, this condition appears to be met.