set.seed(1)
x1 = runif(100)
x2 = 0.5*x1 + rnorm(100)/10
y = 2 + 2*x1 + 0.3*x2 + rnorm(100)
Our model is \[ y\ =\ \beta_0+\beta_1X_1+\beta_2X_2 + \epsilon \] where \[\beta_0 = 2 \\ \beta_1 = 2 \\ \beta_2 = 0.3 \\ \epsilon \sim N(0,1 ) \]
cor(x1, x2)
## [1] 0.8351212
plot(x1, x2)
model = lm(y ~ x1 + x2)
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8311 -0.7273 -0.0537 0.6338 2.3359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
## x1 1.4396 0.7212 1.996 0.0487 *
## x2 1.0097 1.1337 0.891 0.3754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925
## F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05
(rounded to three decimal places) \[\hat{\beta_0} = 2.131 \\ \hat{\beta_1} = 1.44 \\ \hat{\beta_2} = 1.01 \\\]
Note that \[\beta_0 - \hat{\beta_0} = -0.131 \\ \beta_1 - \hat{\beta_1} = 0.56 \\ \beta_2- \hat{\beta_2} = -0.71 \]
These results indicate nontrivial bias between all of the parameters. Examing the p-values of our parameters, we cannot reject the hypothses for beta 1 or beta 2 with 95% confidence.
modelx1 = lm(y ~ x1)
summary(modelx1)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.89495 -0.66874 -0.07785 0.59221 2.45560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1124 0.2307 9.155 8.27e-15 ***
## x1 1.9759 0.3963 4.986 2.66e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared: 0.2024, Adjusted R-squared: 0.1942
## F-statistic: 24.86 on 1 and 98 DF, p-value: 2.661e-06
We can confidently reject the null hypothesis H0: ??1 = 0 under a least squares regression to predict y using only x1.
modelx2 = lm(y ~ x2)
summary(modelx2)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.62687 -0.75156 -0.03598 0.72383 2.44890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3899 0.1949 12.26 < 2e-16 ***
## x2 2.8996 0.6330 4.58 1.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared: 0.1763, Adjusted R-squared: 0.1679
## F-statistic: 20.98 on 1 and 98 DF, p-value: 1.366e-05
We can confidently reject the null hypothesis H0: ??1 = 0 under a least squares regression to predict y using only x2.
Let us take a closer examination of \[ y\ =\ 2+2X_1+0.3X_2 + \epsilon \] where \[ X_1 \sim U(0,1) \\ X_2 = \frac{1}{2}X_1\ +\ \frac{1}{10}\epsilon \\ \epsilon \sim N(0,1) \]
It is perfectly possibly to rewrite y as two different linear equations \[ y_1 = 2+\frac{43}{20}X_1+\frac{103}{100}\epsilon\] and \[ y_2 = 2+\frac{43}{10}X_2+\frac{3}{5}\epsilon \] This suggests to us that either X1 or X2 might attequately model y seperately. We could even inspect the moment generating functions of all three functions to guarentee that their distributions match if time permitted.
However, we are still left with the question of why the model that utilized X1 and X2 did not fit where the other two did. Recall from part (b) that we found significant correlation between X1 and X2, as we would expect from dependent variables. This suggests collinearity between the variables. Collinearity has the effect of magnifying our error estimates when constructing our coefficients, leading to high p-values. Thus, it is not a contradiction for a model that includes two collinear predictors to have significant p-values while models of individual predictors do not.
set.seed(1)
x1 = runif(100)
x2 = 0.5*x1 + rnorm(100)/10
y = 2 + 2*x1 + 0.3*x2 + rnorm(100)
x1 = c(x1, .1)
x2 = c(x2, .8)
y = c(y, 6)
model = lm(y ~ x1 + x2)
modelx1 = lm(y ~ x1)
modelx2 = lm(y ~ x2)
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.73348 -0.69318 -0.05263 0.66385 2.30619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2267 0.2314 9.624 7.91e-16 ***
## x1 0.5394 0.5922 0.911 0.36458
## x2 2.5146 0.8977 2.801 0.00614 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.075 on 98 degrees of freedom
## Multiple R-squared: 0.2188, Adjusted R-squared: 0.2029
## F-statistic: 13.72 on 2 and 98 DF, p-value: 5.564e-06
plot(model)
summary(modelx1)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8897 -0.6556 -0.0909 0.5682 3.5665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2569 0.2390 9.445 1.78e-15 ***
## x1 1.7657 0.4124 4.282 4.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.111 on 99 degrees of freedom
## Multiple R-squared: 0.1562, Adjusted R-squared: 0.1477
## F-statistic: 18.33 on 1 and 99 DF, p-value: 4.295e-05
plot(modelx1)
summary(modelx2)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.64729 -0.71021 -0.06899 0.72699 2.38074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3451 0.1912 12.264 < 2e-16 ***
## x2 3.1190 0.6040 5.164 1.25e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.074 on 99 degrees of freedom
## Multiple R-squared: 0.2122, Adjusted R-squared: 0.2042
## F-statistic: 26.66 on 1 and 99 DF, p-value: 1.253e-06
plot(modelx2)
yBoth = coefficients(model)[1] + coefficients(model)[2]*x1 + coefficients(model)[3]*x2
yx1 = coefficients(modelx1)[1] + coefficients(modelx1)[2]*x1
yx2 = coefficients(modelx2)[1] + coefficients(modelx2)[2]*x2
upperyboth = mean(yBoth) + 2*sd(yBoth)
loweryboth = mean(yBoth) - 2*sd(yBoth)
upperyx1 = mean(yx1) + 2*sd(yx1)
loweryx1 = mean(yx1) - 2*sd(yx1)
upperyx2 = mean(yx2) + 2*sd(yx2)
loweryx2 = mean(yx2) - 2*sd(yx2)
The ovservation 101 is an outlier for all three models, as each model has a maximum value of less than 4.5, and the observation has a value of 6. The addition of our new point turns x1 into our weakest predictor. While both models with individual predictors perform well with low p-values for beta 1, the x1 model’s residual is 6 points smaller than the other two models. It is a high leverage point of all three models. Observation 101 only signficantly disrupts model x1, and even makes the x1 + x2 model workable.
I believe that adding observation 101 disrubted the collinearity.