DATA 606 Chapter 7 Assignment

Chapter 7 - Introduction to Linear Regression
Graded: 7.24, 7.26, 7.30, 7.40
- 7.24
- 7.26
- 7.30
- 7.40

Chapter 7 - Introduction to Linear Regression

Graded: 7.24, 7.26, 7.30, 7.40

7.24

a.The scatterplot suggests that there is a positive relationship between calories and carbohydrates, i.e., higher levels of calories is associated with higher levels of carbs. There is a strong relationship between number of carbs and number of calories

By the way the graph is layed out it appears Calories is the explanatory variable and Carb (in grams) is the response variable, but in my opinion I would argue that carbs is the explanatory variable, because having carbohydrates in a food causes it to have more calories. Whereas calories aren’t a great predictor because, a higher volume of calories could also imply that the food has fats or proteins also.
We could use a regression line to see the number of calories based on the number of carbs.
Conditions which should be met are: Linearity: data in this case does show a positive linear trend. Nearly Normal residuals: the distributions seems to be slightly right skew, but in general it does seem to be normal. Constant variability: data in this case doesn’t seem to be completely constantly variable. Residuals on the far right are substantially larger than on the left side. This suggest a linear regression might not be the best fit. Independent Observations: here it is not possible to determine independence of the variables. As we see in the residuals plot, because they are larger towards the right, it seems there is a dependence between the variables. Larger values of calories have larger residuals against the response variable Carb.

7.26

Explanatory variable: shoulder girth (in cm) Response variable: height (in cm)

\[ \hat{height} = \beta_{0} + \beta_{1} * shouldergirth \]

\[ b_{1} = s_{y} / s_{x} * R = 9.41 / 10.37 * 0.67 = 0.608 \] \[ y - y_{0} = b_{1} * (x - x_{0}) \] \[ y - 171.14 = 0.608 * (x - 107.20) \] \[ y = 0.608 * x - 0.608 * 107.20 + 171.14 = 105.9624 + 0.608 * x \] \[ \hat{height} = 105.9624 + 0.608 * shouldergirth \] slope= .608

This indicates that each additional cm of shoulder girth is associated with an additional 0.608 cm of height, as predicted by the linear model.

intercept = 105.965

This indicates that if the linear model is correct, then a shoulder girth of 0 cm is associated with a height of 106.0 cm, assuming that the model is applicable to this range of values. However, a shoulder girth of 0 cm is not meaningful for the range of observations in the dataset, so the intercept simply serves to move the regression line up (vertically). \[ R^{2} = 0.67^{2} = 0.4489 \] The R2 of 0.449 indicates that ≈ 45% of the variability in the response variable (height) can be explained by the variability in the explanatory variable (shoulder girth), in the context of the linear model.

mean_x <- 107.20
stddev_x <- 10.37
mean_y <- 171.14
stddev_y <- 9.41
R <- 0.67
(b1 <- stddev_y / stddev_x * R)

## [1] 0.6079749

(b0 <- mean_y - b1 * mean_x)

## [1] 105.9651

\[ \hat{height} = 105.9624 + 0.608 * shouldergirth = 105.9624 + 0.608 * 100 = 166.7624 \] if x =100 then y = 106 + .608*100 = 166.8
\[ e_{100} = 160 - 166.7624 = -6.7624 \] the student is actually 160 cm tall so the residual is -6.8, that means the model overpredicted the height
No, it would not be appropriate to apply the linear model in this case. Looking at the dataset on which the linear model was estimated, the range of shoulder girths is roughly 85-135 cm, while the range of heights is roughly 145-200 cm. A shoulder girth of 56 cm from the one-year old is clearly outside the range of values of the training dataset

7.30

Explanatory variable: body weight (in kg) Response variables: heart weight (in g)

\[ \text{heart weight(g)} = -0.357 + 4.034 \cdot \text{body weight (kg)} \]

Intercept:-.357 The intercept tells us that for a body weight of zero, the heart weight is -0.357 g In this context, the intercept has no meaning.
Slope: 4.034 It tells us there is a positive relationship between body weight and heart weight. If we increase body weight by 1 Kg, heart weight increases by 4.034 g
\[ R^{2} = 0.6455 \] shows that 64.66% of the variability of heart weight is explained by body weight.
\[\sqrt{R^{2}}]\] = .804 which is the correlation coefficient

7.40

\[ \hat{y} = 4.010 + \beta_{1} * x \]

\[ 3.9983 = 4.010 + \beta_{1} * -0.0883 \]

\[ \beta_{1} = (3.9983 - 4.010) / -0.0883 = 0.1325028 \]

b.Yes, although very small it is positive. The regression’s p-value is shown to be close to zero therefore we can reject the null hypothesis and conclude that the slope is not zero

Conditions for Linear regression

Linearity - There is a slight linear relaitonship between the professor scores and their respective beauty scores

Normally distributed residuals - the distribution plot of residuals shows very normal, maybe with a very slight left skew.

Constatn variability - The residuals scattered plot does not show any pattern. We see a very constant variability for all values left to right.

Independent Observations - there is no indications of dependance, with residuals showing constant variability.