Chapter 8 - Introduction to Linear Regression

Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain. The relationship between calories and carbs seems to be positive overall but there is some variability especially at the higher calories level where the data points exist on both sides of the regression line.
In this scenario, what are the explanatory and response variables? The explanatory variables are calories along the x-axis and carbs are the response variable on the y-axis.
Why might we want to fit a regression line to these data? A regression line helps us understand the relationship between the two variables and gives an estimate of the amount of carbs based on the number of calories.
Do these data meet the conditions required for fitting a least squares line? Nearly Normal Residuals: The histogram of residual seems normal.Independent observations: assume the data to be independent. Linearity: The scatter plot shows a linear relationship between carbs and carolies, and Constant Variability: There seem to be constant variability among the variables.

Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.

\begin{center} \end{center}

Describe the relationship between shoulder girth and height. There is a positive relationship between the height and shoulder girth as evidenced in the graph above.
How would the relationship change if shoulder girth was measured in inches while the units of height remained in centimeters? The positive relationship would be the same but the slope would steepen to account for the larger change in height for each inch of shoulder.

Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.

Write the equation of the regression line for predicting height.

x <- 10.37
y <- 9.41
r <- 0.67

mnx <- 107.20
mny <- 171.14

m <- (y / x ) *r
b <- mny - (m*mnx)
cat(b, m)

## 105.9651 0.6079749

yˆ=105.9651+0.6079749∗x

Interpret the slope and the intercept in this context. The slope means for every 1cm increase in shoulder girth, height would increase by 0.608cm.

The intercept means when should girth is 0cm, the height is 105.97cm, which is impossible to happen. It is an adjustment on prediction of height when we get hold of a piece of shoulder girth information.

Calculate \(R^2\) of the regression line for predicting height from shoulder girth, and interpret it in the context of the application. r^2 is 0.4489. It means the regression line accounts for 44.89% of the variance height predicted from shoulder girth.
A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model. The predicted height of a student having shoulder girth 100cm is around 166.76cm.

m*100+b

## [1] 166.7626

The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means. The residual is -6.7626. It means we have overestimated the height of that specific student by 6.7626cm. 160-166.761=-6.761
A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child? No, because shoulder girth 56cm falls near 5 standard deviations away from the mean. The value of his/her shoulder girth goes too far beyond the original observation range, it is not appropriate for us to use predict the height of this child as it would be an extrapolation.

Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.

\begin{center}

## 
## Call:
## lm(formula = cats$Hwt ~ cats$Bwt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5694 -0.9634 -0.0921  1.0426  5.1238 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.3567     0.6923  -0.515    0.607    
## cats$Bwt      4.0341     0.2503  16.119   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.452 on 142 degrees of freedom
## Multiple R-squared:  0.6466, Adjusted R-squared:  0.6441 
## F-statistic: 259.8 on 1 and 142 DF,  p-value: < 2.2e-16

\end{center}

Write out the linear model. y=-0.357+4.034*x
Interpret the intercept. The intercept-0.357 means when the body weight equal to 0kg, the heart weight is -0.357 grams. This is not a meaningful value
Interpret the slope. The slope = 4.034, what means for each additional kg increase in body weight, we expect an additional 4.034 grams in the heart weight
Interpret \(R^2\). There is about 64.66% in the data’s variation by using information about body weight for predicting heart weight using a linear model
Calculate the correlation coefficient.

sqrt(0.6466)

## [1] 0.8041144

Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.

\begin{center}

\end{center}

Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.

The slope = 0.133.

mnx <- -0.0883
mny <- 3.9983
summary(m_eval_beauty)

## 
## Call:
## lm(formula = eval ~ beauty)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.80015 -0.36304  0.07254  0.40207  1.10373 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.01002    0.02551 157.205  < 2e-16 ***
## beauty       0.13300    0.03218   4.133 4.25e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5455 on 461 degrees of freedom
## Multiple R-squared:  0.03574,    Adjusted R-squared:  0.03364 
## F-statistic: 17.08 on 1 and 461 DF,  p-value: 4.247e-05

m<-(mny - 4.010)/mnx
m

## [1] 0.1325028

Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning. Yes, since both Sy and Sx are positive the data slope is positive as well.
List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.

Constant variability: The residual scatterplot indicates constant variability.

Independent observations: We assume data is independent in the sample.

Nearly normal residuals: The residuals histogram shows the data as nearly normally distributed.

Linearity: The scatter plots show the relationship between beauty and teaching evaluation as linear.

Chapter 8 - Introduction to Linear Regression

Dominika Markowska-Desvallons