(a) Perform the following commands in R. The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?

set.seed(1)
x1 = runif(100)
x2 = 0.5*x1 + rnorm(100)/10
y = 2 + 2*x1 + 0.3*x2 + rnorm(100)

Our model is \[ y\ =\ \beta_0+\beta_1X_1+\beta_2X_2 + \epsilon \] where \[\beta_0 = 2 \\ \beta_1 = 2 \\ \beta_2 = 0.3 \\ \epsilon \sim N(0,1 ) \]

(b) What is the correlation between x1 and x2? Create a scatterplot displaying the relationship between the variables.

cor(x1, x2)
## [1] 0.8351212
plot(x1, x2)

(c) Using this data, fit a least squares regression to predict y using x1 and x2. Describe the results obtained. What are ??^0, ??^1, and ??^2? How do these relate to the true ??0, ??1, and ??2? Can you reject the null hypothesis H0: ??1 = 0? How about the null hypothesis H0: ??2 = 0?

model = lm(y ~ x1 + x2)
summary(model)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8311 -0.7273 -0.0537  0.6338  2.3359 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1305     0.2319   9.188 7.61e-15 ***
## x1            1.4396     0.7212   1.996   0.0487 *  
## x2            1.0097     1.1337   0.891   0.3754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared:  0.2088, Adjusted R-squared:  0.1925 
## F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05

(rounded to three decimal places) \[\hat{\beta_0} = 2.131 \\ \hat{\beta_1} = 1.44 \\ \hat{\beta_2} = 1.01 \\\]

Note that \[\beta_0 - \hat{\beta_0} = -0.131 \\ \beta_1 - \hat{\beta_1} = 0.56 \\ \beta_2- \hat{\beta_2} = -0.71 \]

These results indicate nontrivial bias between all of the parameters. Examing the p-values of our parameters, we cannot reject the hypothses for beta 1 or beta 2 with 95% confidence.

(d) Now fit a least squares regression to predict y using only x1. Comment on your results. Can you reject the null hypothesis H0: ??1 = 0?

modelx1 = lm(y ~ x1)
summary(modelx1)
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.89495 -0.66874 -0.07785  0.59221  2.45560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1124     0.2307   9.155 8.27e-15 ***
## x1            1.9759     0.3963   4.986 2.66e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared:  0.2024, Adjusted R-squared:  0.1942 
## F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06

We can confidently reject the null hypothesis H0: ??1 = 0 under a least squares regression to predict y using only x1.

(e) Now fit a least squares regression to predict y using only x2. Comment on your results. Can you reject the null hypothesis H0: ??1 = 0?

modelx2 = lm(y ~ x2)
summary(modelx2)
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.62687 -0.75156 -0.03598  0.72383  2.44890 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3899     0.1949   12.26  < 2e-16 ***
## x2            2.8996     0.6330    4.58 1.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared:  0.1763, Adjusted R-squared:  0.1679 
## F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05

We can confidently reject the null hypothesis H0: ??1 = 0 under a least squares regression to predict y using only x2.

(f) Do the results obtained in (c)-(e) contradict each other? Explain your answer.

Let us take a closer examination of \[ y\ =\ 2+2X_1+0.3X_2 + \epsilon \] where \[ X_1 \sim U(0,1) \\ X_2 = \frac{1}{2}X_1\ +\ \frac{1}{10}\epsilon \\ \epsilon \sim N(0,1) \]

It is perfectly possibly to rewrite y as two different linear equations \[ y_1 = 2+\frac{43}{20}X_1+\frac{103}{100}\epsilon\] and \[ y_2 = 2+\frac{43}{10}X_2+\frac{3}{5}\epsilon \] This suggests to us that either X1 or X2 might attequately model y seperately. We could even inspect the moment generating functions of all three functions to guarentee that their distributions match if time permitted.

However, we are still left with the question of why the model that utilized X1 and X2 did not fit where the other two did. Recall from part (b) that we found significant correlation between X1 and X2, as we would expect from dependent variables. This suggests collinearity between the variables. Collinearity has the effect of magnifying our error estimates when constructing our coefficients, leading to high p-values. Thus, it is not a contradiction for a model that includes two collinear predictors to have significant p-values while models of individual predictors do not.

Now suppose we obtain one additional observation, which was unfortunately mismeasured. Re-fit the linear models from (c) to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.

set.seed(1)
x1 = runif(100)
x2 = 0.5*x1 + rnorm(100)/10
y = 2 + 2*x1 + 0.3*x2 + rnorm(100)

x1 = c(x1, .1)
x2 = c(x2, .8)
y = c(y, 6)

model = lm(y ~ x1 + x2)
modelx1 = lm(y ~ x1)
modelx2 = lm(y ~ x2)
summary(model)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.73348 -0.69318 -0.05263  0.66385  2.30619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2267     0.2314   9.624 7.91e-16 ***
## x1            0.5394     0.5922   0.911  0.36458    
## x2            2.5146     0.8977   2.801  0.00614 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.075 on 98 degrees of freedom
## Multiple R-squared:  0.2188, Adjusted R-squared:  0.2029 
## F-statistic: 13.72 on 2 and 98 DF,  p-value: 5.564e-06
plot(model)

summary(modelx1)
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8897 -0.6556 -0.0909  0.5682  3.5665 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2569     0.2390   9.445 1.78e-15 ***
## x1            1.7657     0.4124   4.282 4.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.111 on 99 degrees of freedom
## Multiple R-squared:  0.1562, Adjusted R-squared:  0.1477 
## F-statistic: 18.33 on 1 and 99 DF,  p-value: 4.295e-05
plot(modelx1)

summary(modelx2)
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.64729 -0.71021 -0.06899  0.72699  2.38074 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3451     0.1912  12.264  < 2e-16 ***
## x2            3.1190     0.6040   5.164 1.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.074 on 99 degrees of freedom
## Multiple R-squared:  0.2122, Adjusted R-squared:  0.2042 
## F-statistic: 26.66 on 1 and 99 DF,  p-value: 1.253e-06
plot(modelx2)

yBoth = coefficients(model)[1] + coefficients(model)[2]*x1 + coefficients(model)[3]*x2
yx1 = coefficients(modelx1)[1] + coefficients(modelx1)[2]*x1
yx2 = coefficients(modelx2)[1] + coefficients(modelx2)[2]*x2

upperyboth = mean(yBoth) + 2*sd(yBoth)
loweryboth = mean(yBoth) - 2*sd(yBoth)

upperyx1 = mean(yx1) + 2*sd(yx1)
loweryx1 = mean(yx1) - 2*sd(yx1)

upperyx2 = mean(yx2) + 2*sd(yx2)
loweryx2 = mean(yx2) - 2*sd(yx2)

The ovservation 101 is an outlier for all three models, as each model has a maximum value of less than 4.5, and the observation has a value of 6. The addition of our new point turns x1 into our weakest predictor. While both models with individual predictors perform well with low p-values for beta 1, the x1 model’s residual is 6 points smaller than the other two models. It is a high leverage point of all three models. Observation 101 only signficantly disrupts model x1, and even makes the x1 + x2 model workable.

I believe that adding observation 101 disrubted the collinearity.