HW7_606:Introduction to Linear Regression

Graded: 7.24, 7.26, 7.30, 7.40

7.24 Nutrition at Starbucks, Part I. The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.

Ans: There is a positive, weak, linear association between the number of calories and amount of carbohydrates.

In this scenario, what are the explanatory and response variables?

Ans: Calories is explanatory, and Carb is response variables.

Why might we want to fit a regression line to these data?

Ans: we can predict Carb for a given nmber of Calories using a regression line. This is may useful information for determining how much carbohydrates may produce calories.

Do these data meet the conditions required for fitting a least squares line? 20Association

Ans: Even though the relationship appears linear in the scatterplot, the residual plot actuarlly shows no obvius patterns. It is reasonalble to try to fit a linear modle to the data. However, it is unclear wheather ther is statistically significant evidence that the slop parameter is different from zero. From the histogram of the residual, it shows the fat tail on the right side, which means there may be a hidden structure not evident in the current plots but at s important to consider.

7.26 Body measurements, Part III. Exercise 7.15 introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.

Write the equation of the regression line for predicting height.

\(Slop = 0.67 * 10.37/9.41 = 0.738\)

\[ \hat{y}_i = b_0 + 0.738 * \hat{x}_i \]

\[ 171.14 = b_0 + 0.738 * 107.2 => b_0 = 92.29 \]
####regression line:
\[ \hat{height}_i = 92.29 +0.738 * \hat{shoulder girth}_i \]

Interpret the slope and the intercept in this context.

Ans: Slope:For each additonal shouder girth, the model predicts an additional 0.738 cm in height. Intercept: When shoulder girth is 0, the height is expected to be 92.29 cm. Though it doesn’t make sence to have height greater than 0 in this context.

Calculate R2 of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.

Ans: \(R^2\) = \(0.67^2\) = 0.4489, About 44.89% of the variability in height is accounted for by the model, ie. explained by the shoulder girth.

A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.

\[ 166.09 cm = 92.29 +0.738 * 100 \]

The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.

\[ e_i = 160 - 166.09 = - 6.09 cm \]

A negtive residual means that the model overestimate the height.

A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?

Ans: No. There is no data show at this lesser 90cm shoulder girth corelated to 150 cm height below. We don’t know the model could be use for predict the height of the child.

7.30 Cats, Part I. The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats. Estimate Std. Error t value Pr(>|t|) (Intercept) -0.357 0.692 -0.515 0.607 body wt 4.034 0.250 16.119 0.000 s = 1.452 R2 = 64.66% R2 adj = 64.41%

Write out the linear model.

regression line:

\[ \hat{heartweight}_i = -0.357 + 4.034 * \hat{bodyweight}_i \]

Interpret the intercept.

Ans: Expected heart weight of cat with no body weight is -0.357 g.

Interpret the slope.

Ans: For every additional kg in body weight, there is 4.034 g additional to heart weight.

Interpret R2.

Ans: Body weight level expalined 64.66% of heart weight.

Calculate the correlation coefficient.

sqrt(0.6466)

## [1] 0.8041144

7.40 Rate my professor. Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching e???ectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for asample of 463 professors.24 The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.

Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.

regression line:

\[ \hat{teachingeval}_i = 4.01 + b_1 * \hat{beauty}_i \]

\[ 3.9983 = 4.01+ b_1 *(-0.0883) => b_1= 0.1325 \]

Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.

Ans: No. The scatterplot shows no linear relation between beauty and teaching evaluation.

List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.

Ans: The following four condiions required for linear regression: 1. Linear, 2. Nearly normal residuas, 3. Constant variability and 4. Independent observation.

picture 1 : Scatterplot shows residuals of the beauty vas teaching evaluation have many outliners, with majority of points between -1 to 1, very a few outliners are below -1. The residuals is nearly normal.

picture 2 : Histogram shows residuals is not normal. It is right skewness.

picture 3 : Scatterplot shows the theoretical qualntiles and sample quantiles very strong, positive , linear relaionship with many points fall on the linear line.

picture 4: Scatterplot shows the reseduals for order of data collection have many points at the range of -0.5 to 0.5, but also see many outliners below -0.5. The residuals is not normal.

HW7_606:Introduction to Linear Regression

Chunhui Zhu

November 6, 2017

Graded: 7.24, 7.26, 7.30, 7.40

Ans: There is a positive, weak, linear association between the number of calories and amount of carbohydrates.

Ans: Calories is explanatory, and Carb is response variables.

Ans: we can predict Carb for a given nmber of Calories using a regression line. This is may useful information for determining how much carbohydrates may produce calories.

Ans: Slope:For each additonal shouder girth, the model predicts an additional 0.738 cm in height. Intercept: When shoulder girth is 0, the height is expected to be 92.29 cm. Though it doesn’t make sence to have height greater than 0 in this context.

Ans: \(R^2\) = \(0.67^2\) = 0.4489, About 44.89% of the variability in height is accounted for by the model, ie. explained by the shoulder girth.

A negtive residual means that the model overestimate the height.

Ans: No. There is no data show at this lesser 90cm shoulder girth corelated to 150 cm height below. We don’t know the model could be use for predict the height of the child.

regression line:

Ans: Expected heart weight of cat with no body weight is -0.357 g.

Ans: For every additional kg in body weight, there is 4.034 g additional to heart weight.

Ans: Body weight level expalined 64.66% of heart weight.

regression line:

Ans: No. The scatterplot shows no linear relation between beauty and teaching evaluation.

Ans: The following four condiions required for linear regression: 1. Linear, 2. Nearly normal residuas, 3. Constant variability and 4. Independent observation.