Multiple Linear Regression

1.


  1. The form of the linear model is: y = \(\beta_0\) + \(\beta_1\)x1+ \(\beta_2\)x2 + \(\epsilon\)

    The values of the regression coefficients are:

    \(\beta_0\) = 2

    \(\beta_1\) = 2

    \(\beta_2\) = 0.3

    So, this model can be written as: y = 2 + 2x1 + 0.3x2  

    The value of \(\sigma^2\) = \(1^2\) = 1.


  2. The correlation coefficient between x1 and x2 is 0.0170321.

    The scatter plot displaying the relationship between the variables x1 and x2 is presented below:


  1. The results after fitting a least squares regression to predicty using x1 and x2 are as follows:  

    The values of \(\hat\beta_0\), \(\hat\beta_1\) and \(\hat\beta_2\) are shown below, and we can see that these \(\hat\beta\)s are related to the true \(\beta\)s’ values because they are very close approximations:

    \(\hat\beta_0\) \(\approx\) 1.9763 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_1\) \(\approx\) 1.9307 \(\approx\) \(\beta_1\) = 2 (close)

    \(\hat\beta_2\) \(\approx\) 0.3014 \(\approx\) \(\beta_2\) = 0.3 (close)

    The value of s is s = 0.9675. This value of s is related to the true value of \(\sigma^2\) because it is a fairly close approximation of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_1\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.000000689, which is less than any reasonable significance level. Hence, there is evidence that x1 is a statistically significant variable.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_2\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.000000000000333, which is less than any reasonable significance level. Hence, there is evidence that x2 is a statistically significant variable.


  2. The results after fitting a least squares regression to predicty using only x1 are as follows:  

    The values of \(\hat\beta_0\), \(\hat\beta_1\) and \(\hat\beta_2\) are shown below, and in this case these \(\hat\beta\)s are not both as closely related to the true \(\beta\)s’ values as they were in part c:  

    \(\hat\beta_0\) \(\approx\) 6.5235 \(\neq\) \(\beta_0\) = 2 (not close)

    \(\hat\beta_1\) \(\approx\) 1.9829 \(\approx\) \(\beta_1\) = 2 (close)

    The value of s is s = 1.267. This value of s is related to the true value of \(\sigma^2\) because it is a somewhat close approximation (not as close as in part c) of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_1\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.0000664, which is less than any reasonable significance level. Hence, there is evidence that x1 is a statistically significant variable.


  3. The results after fitting a least squares regression to predicty using only x2 are as follows:  

    The values of \(\hat\beta_0\) and \(\hat\beta_2\) are shown below, and in this case these \(\hat\beta\)s are also not both as closely related to the true \(\beta\)s’ values as they were in part c:

    \(\hat\beta_0\) \(\approx\) 2.927 \(\neq\) \(\beta_0\) = 2 (not very close)

    \(\hat\beta_2\) \(\approx\) 0.3047 \(\approx\) \(\beta_2\) = 0.3 (close)

    The value of s is s = 1.094. This value of s is related to the true value of \(\sigma^2\) because it is a close approximation (not as close as in part c but much closer than in part d) of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_2\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.0000000000246, which is less than any reasonable significance level. Hence, there is evidence that x2 is a statistically significant variable.

2.


  1. The form of the linear model and the values of the regression coefficients are the same as in problem number 1. The differences found between Exercise 1 and Exercise 2 are as follows:

    • The correlation coefficient in exercise 2 part b is much higher and closer to the value 1 than the correlation coefficient in problem number 1 part b. This is reflected in the difference between the scatterplots in Exercise 1 versus Exercise 2.
    • In part c, when fitting a least squares regression to predict y using both x1 and x2, our fit is not as good in Exercise 2 as it was in Exercise 1 because our \(\hat\beta_1\) and \(\hat\beta_2\) estimates are not close to our true \(\beta_1\) and \(\beta_2\) values. Also, our hypotheses tests both reject the nulls and therefore do not confirm whether x1 and x2 are likely statistically significant.
    • In part d, when fitting a least squares regression to predict y using just x1, we find that in Exercise 2 our estimated \(\hat\beta\)’s values are both close approximations of their true \(\beta\) values whereas in Exercise 1 \(\hat\beta_0\) was not a close approximation on \(\beta_0\). The hypotheses tests in Exercise 1 and Exercise 2 both confirm that there is evidence that x1 is a statistically significant variable.
    • In part e, when fitting a least squares regression to predict y using just x2, we find that in Exercise 2 our \(\hat\beta_0\) is a close approximation of the true \(\beta_0\) value whereas in Exercise 1 it was not. However, now in Exercise 2, our \(\hat\beta_2\) is not a close approximation of the true \(\beta_2\) value, but in Exercise 1 it was close.
    • I believe the main reason that these differences occur is due to the fact that in Exercise 2 our variable x2 is sampled from a random normal distribution.

  2. The correlation coefficient between x1 and x2 is 0.9975904.

    The scatter plot displaying the relationship between the variables x1 and x2 is presented below:


  3. The results after fitting a least squares regression to predicty using x1 and x2 are as follows:  

    The values of \(\hat\beta_0\), \(\hat\beta_1\) and \(\hat\beta_2\) are shown below, and we can see that these \(\hat\beta\)s are not all close approximations to the true \(\beta\)s’ values:  

    \(\hat\beta_0\) \(\approx\) 2.1305 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_1\) \(\approx\) -1.754 \(\neq\) \(\beta_1\) = 2 (not close)

    \(\hat\beta_2\) \(\approx\) 7.3967 \(\neq\) \(\beta_2\) = 0.3 (not close)

    The value of s is s = 1.056. This value of s is related to the true value of \(\sigma^2\) because it is a close approximation of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    No, we cannot reject the null hypothesis \(H_0\): \(\beta_1\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.76, which is greater than any reasonable significance level. Hence, we do not have sufficient evidence to indicate whether x1 is a statistically significant variable.

    No, we cannot reject the null hypothesis \(H_0\): \(\beta_2\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.516, which is greater than any reasonable significance level. Hence, we do not have sufficient evidence to indicate whether x2 is a statistically significant variable.


  4. The results after fitting a least squares regression to predicty using only x1 are as follows:  

    The values of \(\hat\beta_0\) and \(\hat\beta_1\) are shown below, and in this case these \(\hat\beta\)s are more closely related as good approximations of the true \(\beta\)s’ values:  

    \(\hat\beta_0\) \(\approx\) 2.1172 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_1\) \(\approx\) 1.9675 \(\approx\) \(\beta_1\) = 2 (close)

    The value of s is s = 1.053. This value of s is related to the true value of \(\sigma^2\) because it is also a close approximation of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_1\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.00000279, which is less than any reasonable significance level. Hence, there is evidence that x1 is a statistically significant variable.


  5. The results after fitting a least squares regression to predicty using only x2 are as follows:  

    The values of \(\hat\beta_0\) and \(\hat\beta_2\) are shown below, and in this case, these \(\hat\beta\)s are not both as closely related to the true \(\beta\)s’ values:  

    \(\hat\beta_0\) \(\approx\) 2.1199 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_2\) \(\approx\) 3.9273 \(\neq\) \(\beta_2\) = 0.3 (not close)

    The value of s is s = 1.051. This value of s is related to the true value of \(\sigma^2\) because it is also a close approximation of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_2\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.00000235, which is less than any reasonable significance level. Hence, there is evidence that x2 is a statistically significant variable.

3.  

  1. After re-fitting the models from parts c,d, and e with this new data, the effects we see with the addition of this new observation for each of the models compared with the models in Exercise 2 are as follows:

    • Model C: Our estimated \(\hat\beta\)s are still not all close estimates of their true \(\beta\)s values. Yet, while they are closer than in Exercise 2, they are still far off (only \(\hat\beta_0\) is still close to \(\beta_0\)). In the hypotheses tests, we find evidence that x2 is statistically significant whereas in Exercise 2 we did not. This observation is an outlier for this model and possibly a high leverage point because it appears that it may have passed Cook’s distance in the Residuals vs. Leverage diagnostic plot, as we can see below:
    • Model D: Now, in this case, we find both \(\hat\beta_0\) and \(\hat\beta_1\) to be close approximations of their true \(\beta\)s values, whereas in Exercise 2 we did not find \(\hat\beta_0\) to be close. Hypothesis test has similar results as Exercise 2. This observation is an outlier for this model but not a high leverage point, as we can see below in the Residuals vs. Leverage plot the observation clearly does not pass Cook’s distance:
    • Model E: This model has similar results to those of the model in Exercise 2 (e). In this case, this observation is an outlier for this model but does not appear to be a high leverage point, as we can see below in the Residuals vs. Leverage plot the observation clearly does not pass Cook’s distance:
  2. Did not need to do for Exercise 3.


  3. The results after fitting a least squares regression to predicty using x1 and x2 are as follows:

    The values of \(\hat\beta_0\), \(\hat\beta_1\) and \(\hat\beta_2\) are shown below, and we can see that these \(\hat\beta\)s are not all close approximations to the true \(\beta\)s’ values:  

    \(\hat\beta_0\) \(\approx\) 2.125 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_1\) \(\approx\) -0.5183 \(\neq\) \(\beta_1\) = 2 (not close)

    \(\hat\beta_2\) \(\approx\) 4.944 \(\neq\) \(\beta_2\) = 0.3 (not close)

    The value of s is s = 1.051. This value of s is related to the true value of \(\sigma^2\) because it is a close approximation of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    No, we cannot reject the null hypothesis \(H_0\): \(\beta_1\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.49563, which is greater than any reasonable significance level. Hence, we do not have sufficient evidence to indicate whether x1 is a statistically significant variable.

    Yes, we cannot reject the null hypothesis \(H_0\): \(\beta_2\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.000695, which is less than any reasonable significance level. Hence, there is evidence that x2 is statistically significant.  


  4. The results after fitting a least squares regression to predicty using only x1 are as follows:  

    The values of \(\hat\beta_0\) and \(\hat\beta_1\) are shown below, and in this case, these \(\hat\beta\)s are related to the true \(\beta\)s’ values because they are very close approximations:  

    \(\hat\beta_0\) \(\approx\) 2.2616 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_1\) \(\approx\) 1.7575 \(\approx\) \(\beta_1\) = 2 (close)

    The value of s is s = 1.109. This value of s is related to the true value of \(\sigma^2\) because it is a close approximation (slightly closer than in part c) of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_1\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.000045, which is less than any reasonable significance levels. Hence, there is evidence that x1 is statistically significant.


  5. The results after fitting a least squares regression to predicty using only x2 are as follows:  

    The values of \(\hat\beta_0\) and \(\hat\beta_2\) are shown below, and in this case, these \(\hat\beta\)s are not both as closely related to the true \(\beta\)s’ values:  

    \(\hat\beta_0\) \(\approx\) 2.0773 \(\approx\) \(\beta_0\) = 2 (close)

    \(\hat\beta_2\) \(\approx\) 4.1164 \(\neq\) \(\beta_2\) = 0.3 (not close)

    The value of s is s = 1.041. This value of s is related to the true value of \(\sigma^2\) because it is a close approximation of the squareroot of \(\sigma^2\) which is \(\sigma\) = 1.

    Yes, we can reject the null hypothesis \(H_0\): \(\beta_2\) = 0 because the summary shows its corresponding p-value is p \(\approx\) 0.000000134, which is less than any reasonable significance level. Hence, there is evidence that x2 is statistically significant.