Chapter 8 - Introduction to Linear Regression

Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.
Answer: There is a positive relationship between number of calories and amount of carbohydrates that Starbucks food menu items contain.
In this scenario, what are the explanatory and response variables?
Answer: Carbohydrates (in grams) is the explanatory variable that affect the variable Calories, which is the response variable in this scinario.
Why might we want to fit a regression line to these data?
Answer: regression line will make it easier to predict the amount of carbs (in grams) in a Starbucks food menu items by looking at its calories.
Do these data meet the conditions required for fitting a least squares line?
Answer: Linearity: According to the scatter plot, data shows a linear trend with a positive relationship.But the variability of the data around the line increases with large value of x.
Nearly normal residuals: This appear to have a nearly normal residual.
Constant variability: According to the residual plot, the variablity of points around the least square line is not constant.
Independant observation: All observations are independant of each other but limited to Starbuck food menu.
As not all the conditions including linearity and constant variablity do not meet, the least square line does not fit.

Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals.19 The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.

\begin{center}

\end{center}

Describe the relationship between shoulder girth and height.
Answer: As shoulder girth increases, height increases. There is a positive relationship between shoulder girth and height.
How would the relationship change if shoulder girth was measured in inches while the units of height remained in centimeters?
Answer: There will not be any change in the relationship as it just changing the unit to inches instead of cm.

Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.

Write the equation of the regression line for predicting height.
Answer:
\(\beta_{0} = \beta_{1}x\)
Therefor \(\beta_{1} = (S_{y}/S_{x})R\)

Sx <- 10.37
Sy <- 9.41

R <- 0.67

B1 <- (Sy/Sx)*R

x_hat <- 107.2
y_hat <- 171.14

B0 <- y_hat - B1 * x_hat

B0; B1

## [1] 105.9651

## [1] 0.6079749

Equation for the regression line is \(y = 105.9651+0.6079749*X\)

Interpret the slope and the intercept in this context.
Answer: Slope: Represents the number of centimeters increase in height for each increase in shoulder girth.

Intercept: Represent the height in centimeters at girth of 0 cm.

Calculate \(R^2\) of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.
Answer:

R2 <- R^2
R2

## [1] 0.4489

\(R^2=44.89\%\) of the variation of the height is explained by shoulder girth.

A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.
Answer:

height = 105.9651+0.6079749*100
height

## [1] 166.7626

A random student with a shoulder girth of 100cm will have a height of 166.76cm

The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.
Answer:

160-166.76

## [1] -6.76

Residual is 6.76, means predicted value is over compared to the observed value.

A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?

Answer: In this model, minimum shoulder girth is 85cm. There for it is not appropriate to use a one year old with shoulder girth of 56cm as it would be an extrapolaration.

Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.

\begin{center}

\end{center}

Write out the linear model.
Answer: As \(\beta0\) and \(\beta1\) is given in the table under “Estimate”, we can calculate the linear model as follows:
\(y = -0.357+4.034*X\)
Interpret the intercept.
Answer: As in the table intercept is -0.357, this means that the model will predict a negative heart weight when the cats body weight is 0.
Interpret the slope.
Answer: For an increase of 1kg of body weight, heart weight increases by 4.034.
Interpret \(R^2\).
Answer: For this case, according to the table \(R^2=64.66\%\).
Calculate the correlation coefficient.
Answer:

# Calculating the correlation coefficient
R2 <- 0.6466

R <- sqrt(R2)
R

## [1] 0.8041144

Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.

\begin{center}

\end{center}

Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.
Answer:
using \(y = \beta0+\beta1 * X\)

B0 <- 4.010

x <- -0.0883
y <- 3.9983

B1 <- (y-B0)/x
B1

## [1] 0.1325028

Slope is 0.1325028

Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.
Answer: in order for slope to be positive, \(\beta_0\) needs to be greater than 0. Using the Slope of the least square line, \(\beta_1 = (S_y/S_x)R\) where \(S_y>0\) and \(S_x>0\) we can conclude that the these data offer convincing evidence of the slope being positive.
List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.
Answer: Linearity: in this, we are not provided with correlation coefficient or the \(R^{2}\). There for we will assume, with concerns, that the linear condition is satisfied.
Nearly normal residual: Theoretical Quantile graph shows that the distribution is nearly normal.
Independant observation: Since the evaluation was done anonymously, observation are independant.
Constant Variablility: The scatter plot of the residulas does appear to have a constant variablity.

Chapter 8 - Introduction to Linear Regression

Don Padmaperuma