Chapter 8 - Introduction to Linear Regression

Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.

The relationship between number of calories and amount of carbs may be linear, but not very strong. In fact, to me it looks like the small cloud of points in the lower left corner (low calorie, low carb) is forcing the linear relationship. It is positive if it exists

In this scenario, what are the explanatory and response variables?

Number of calories is the explanatory variable. Amount of carbs is the response variable.

Why might we want to fit a regression line to these data?

Mostly to confirm (a), easy to make predictions and see residuals with a regression line.

Do these data meet the conditions required for fitting a least squares line?

Linearity: As described in part (a), there may be a weak linear relationship. Nearly normal residuals: The histogram of the residuals is not completely symmetrical and may not necessarily be nearly normal. Constant variability: Based on the residual plot, I believe there is no contant variability as there are significantly more points on the right (with larger residuals) than on the left of the plot. Independent observations: Observations are independent since food items and their nutritional information does not depend on each other. I believe conditions are not met because of lack of constant variability and distribution of residuals that is not nearly enough normal.

Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals.19 The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.

\end{center}

Describe the relationship between shoulder girth and height.

There appears to be a positive linear relationship

How would the relationship change if shoulder girth was measured in inches while the units of height remained in centimeters?

As the shoulder girth is in cm we would divide the value by 2.54 which would greatly shorten (squish) the plot but would not change the overall relationship.

Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.

Write the equation of the regression line for predicting height.

responseMean = 171.14
responseSD = 9.41
Rvar = .67
explanatoryMean= 107.2
explanatorySD= 10.37
slope = (responseSD/explanatorySD)*Rvar
intercept = responseMean - (slope)*explanatoryMean
regressionFunction = function(x,slope,intercept){
  y =(x*slope)+ intercept
  return(y)
}
tinyGraph = 1:150
tinyGraph = sapply(tinyGraph,regressionFunction,slope,intercept)
plot(tinyGraph, type = 'l',xlab = 'Shoulder girth',ylab='height')

Predicted Y = 105.9650878 + 0.6079749 * explanatory variable

Interpret the slope and the intercept in this context.

The intercept has a positive offset, slope is going up there at a .6 rate to shoulder girth.

Calculate $R^2$ of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.

0.4489…44.89% of the variability in the height is explained by the model

A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.

I’m just plugging 100 into my function (regressionFunction) I made for question (a)…166.7625805

The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.

Residual = observed - predicted, which for this case is -6.7625805 Meaning the actual height is that much less than the line predicted.

A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?

Original data only includes 80 to 140 shoulder girth which is a minus…Variability seems to be constant, correlation is strong and linear…While it would be innappropriate, I’d do it in absence of any other model.

Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.

Write out the linear model.

b0 <- -0.357
b1 <- 4.034

$ =105.8445+0.691*‘bodyweight’ $

Interpret the intercept.

In case the cat’s weight is zero we would expect the heart to weight -0.357g

Interpret the slope.

For each kg of weight the heart would weight an additional 4.034g

Interpret $R^2$.

R2=64.66% * 64.66% of data can be explained using the linear model

Calculate the correlation coefficient.

r2 <- .6466
corr = sqrt(r2)
corr

## [1] 0.8041144

Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.

Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.

b0 <- 4.010
b1 <- 4.13 * 0.0322
b1

## [1] 0.132986

Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.

Since the slope is positive, we can assume that the relationship is positive. We can set up a hypothesis test to where H0: b1 = 0 and HA: b1 > 0 We look at the p-values which are close to zero, which leads us to reject the null hypothesis. It appears that there is a relationship between beauty and teaching evaluation.

List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.

Linerarity: there is a weak, positive linear relationship. There is correlation coefficient or R^2, we accept that this condition is met.nearly normal residuals: based on the histogram, it appears that the residual distibution is nearly normal.

constant variability: the scatterplot data does not appear to have a constant variable.

independent observations: Observations are assumed to be independent.

Chapter 8 - Introduction to Linear Regression

Erinda Budo