Assignment 2

Problem 2.

Carefully explain the differences between the KNN classifier and KNN regression methods.

K-nearest neighbors (KNN) classifier is a predictive model best suited for categorical data in which the responses are qualitative. This method begins by identifying ‘K’ number of points nearest to a test observation. It then classifies this test observation based on conditional probability. By observing the classification of the nearest points to the test observation, you can determine the probability of the test observation being a certain classification. K-nearest neighbors (KNN) regression is a predictive model best suited for continuous data in which the responses are quantitative. This method uses similar logic to the KNN classifier in creating a predictive model. This method begins by identifying K number of training observations that are closest to a prediction point. It then estimates a function for the predictive model using an average of all the selected training responses.

Problem 9.

This question involves the use of multiple linear regression on the Auto data set.

library(ISLR)
attach(Auto)

(a) Produce a scatterplot matrix which includes all of the variables in the data set.

pairs(Auto)

(b) Compute the matrix of correlations between the variables using the function cor(). You will need to exclude the name variable, cor() which is qualitative.

cor(Auto[-9])

##                     mpg  cylinders displacement horsepower     weight
## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
## origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054
##              acceleration       year     origin
## mpg             0.4233285  0.5805410  0.5652088
## cylinders      -0.5046834 -0.3456474 -0.5689316
## displacement   -0.5438005 -0.3698552 -0.6145351
## horsepower     -0.6891955 -0.4163615 -0.4551715
## weight         -0.4168392 -0.3091199 -0.5850054
## acceleration    1.0000000  0.2903161  0.2127458
## year            0.2903161  1.0000000  0.1815277
## origin          0.2127458  0.1815277  1.0000000

(c) Use the lm() function to perform a multiple linear regression with mpg as the response and all other variables except name as the predictors. Use the summary() function to print the results.

model=lm(mpg~.-name,data=Auto)
summary(model)

## 
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

Comment on the output. For instance:
i. Is there a relationship between the predictors and the response?
The linear regression produced would suggest that there is a relationship between MPG as the response variable and the following predictors, Displacement, Weight, Year, and Origin. We can conclude that the predictor variables Cylinders, Horsepower and Acceleration do not have a significant relationship with the response variable MPG, given p-values that are greater than 0.05. The R-squared value calculated shows that of 82.15% of variance in MPG can be explained by the predictors in this regression model.

ii. Which predictors appear to have a statistically significant relationship to the response?
The predictors that appear to have a statistically significant relationship to the response variable MPG are Displacement, Weight, Year, and Origin. These four predictors all have p-values that are less than 0.05.

iii. What does the coefficient for the year variable suggest?
The regression coefficient for year is 0.750773. This suggests that with every other predictor held constant, cars become more fuel efficient every year. MPG in our regression model is increased by 0.750773 every year.

(d) Use the plot() function to produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?

par(mfrow=c(2,2))
plot(model)

We can determine that the data reflect a non-linear relationship given the U-Shape pattern in the Residuals vs. Fitted graph. It appears that the data are normally distributed and right skewed given the Normal Q-Q plot produced. The Scale-Location graph also suggests that the variance in the data is not constant, as the residuals appear to be increasingly spread out. From the Residuals vs. Leverage graph, we can see that observation 14 appears to have a higher amount of leverage, but is not beyond Cook’s distance lines. The plots produced do not suggest any unusually large outliers

(e) Use the * and : symbols to fit linear regression models with interaction effects. Do any interactions appear to be statistically significant?

model=lm(mpg~.-name+displacement*weight,data=Auto)
summary(model)

## 
## Call:
## lm(formula = mpg ~ . - name + displacement * weight, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9027 -1.8092 -0.0946  1.5549 12.1687 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -5.389e+00  4.301e+00  -1.253   0.2109    
## cylinders            1.175e-01  2.943e-01   0.399   0.6899    
## displacement        -6.837e-02  1.104e-02  -6.193 1.52e-09 ***
## horsepower          -3.280e-02  1.238e-02  -2.649   0.0084 ** 
## weight              -1.064e-02  7.136e-04 -14.915  < 2e-16 ***
## acceleration         6.724e-02  8.805e-02   0.764   0.4455    
## year                 7.852e-01  4.553e-02  17.246  < 2e-16 ***
## origin               5.610e-01  2.622e-01   2.139   0.0331 *  
## displacement:weight  2.269e-05  2.257e-06  10.054  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.964 on 383 degrees of freedom
## Multiple R-squared:  0.8588, Adjusted R-squared:  0.8558 
## F-statistic: 291.1 on 8 and 383 DF,  p-value: < 2.2e-16

model=lm(mpg~.-name+displacement*weight+acceleration*horsepower+cylinders*weight,data=Auto)
summary(model)

## 
## Call:
## lm(formula = mpg ~ . - name + displacement * weight + acceleration * 
##     horsepower + cylinders * weight, data = Auto)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.849 -1.620  0.035  1.492 12.002 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -1.247e+01  6.070e+00  -2.053   0.0407 *  
## cylinders               -1.268e+00  1.538e+00  -0.825   0.4100    
## displacement            -4.872e-02  2.389e-02  -2.040   0.0421 *  
## horsepower               6.296e-02  2.526e-02   2.492   0.0131 *  
## weight                  -9.994e-03  1.596e-03  -6.261 1.03e-09 ***
## acceleration             6.654e-01  1.638e-01   4.061 5.92e-05 ***
## year                     7.834e-01  4.457e-02  17.577  < 2e-16 ***
## origin                   4.845e-01  2.594e-01   1.868   0.0626 .  
## displacement:weight      1.269e-05  6.561e-06   1.934   0.0539 .  
## horsepower:acceleration -7.876e-03  1.824e-03  -4.318 2.01e-05 ***
## cylinders:weight         4.943e-04  4.545e-04   1.088   0.2774    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.901 on 381 degrees of freedom
## Multiple R-squared:  0.8654, Adjusted R-squared:  0.8618 
## F-statistic: 244.9 on 10 and 381 DF,  p-value: < 2.2e-16

model=lm(mpg~.-name+displacement*weight+acceleration*horsepower+cylinders*weight+year*origin,data=Auto)
summary(model)

## 
## Call:
## lm(formula = mpg ~ . - name + displacement * weight + acceleration * 
##     horsepower + cylinders * weight + year * origin, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0077 -1.6880  0.0343  1.3731 12.7822 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              8.845e+00  9.215e+00   0.960 0.337743    
## cylinders               -1.120e+00  1.522e+00  -0.736 0.462409    
## displacement            -5.387e-02  2.369e-02  -2.274 0.023533 *  
## horsepower               5.740e-02  2.506e-02   2.291 0.022516 *  
## weight                  -9.879e-03  1.580e-03  -6.254 1.08e-09 ***
## acceleration             6.239e-01  1.626e-01   3.836 0.000146 ***
## year                     5.133e-01  9.895e-02   5.187 3.48e-07 ***
## origin                  -1.209e+01  4.133e+00  -2.926 0.003640 ** 
## displacement:weight      1.356e-05  6.497e-06   2.087 0.037596 *  
## horsepower:acceleration -7.212e-03  1.818e-03  -3.968 8.68e-05 ***
## cylinders:weight         4.394e-04  4.500e-04   0.976 0.329454    
## year:origin              1.618e-01  5.307e-02   3.049 0.002456 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.87 on 380 degrees of freedom
## Multiple R-squared:  0.8686, Adjusted R-squared:  0.8648 
## F-statistic: 228.3 on 11 and 380 DF,  p-value: < 2.2e-16

From the last model produced, we can see that the interaction between Cylinders and Weight is not significant, given a p-value for this interaction that is well above 0.05. All of the other interactions in this model appear to be significant and the R-squared value calculated shows that 86.86% of the variance in MPG can be explained by the predictors.

(f) Try a few different transformations of the variables, such as log(X), √X, X2. Comment on your findings.

summary(lm(mpg~.-name+log(displacement),data=Auto))

## 
## Call:
## lm(formula = mpg ~ . - name + log(displacement), data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1562  -1.8388  -0.0423   1.6999  11.7871 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4.529e+01  8.485e+00   5.337 1.62e-07 ***
## cylinders          3.391e-03  3.025e-01   0.011 0.991060    
## displacement       7.744e-02  9.655e-03   8.021 1.29e-14 ***
## horsepower        -4.380e-02  1.304e-02  -3.358 0.000864 ***
## weight            -4.536e-03  6.404e-04  -7.083 6.80e-12 ***
## acceleration      -1.352e-02  9.142e-02  -0.148 0.882479    
## year               7.827e-01  4.695e-02  16.671  < 2e-16 ***
## origin             4.485e-01  2.799e-01   1.602 0.109926    
## log(displacement) -1.537e+01  1.804e+00  -8.520 3.70e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.055 on 383 degrees of freedom
## Multiple R-squared:  0.8499, Adjusted R-squared:  0.8468 
## F-statistic: 271.1 on 8 and 383 DF,  p-value: < 2.2e-16

By transforming the displacement variable with log(displacement) we can see that it is more significant than displacement.

summary(lm(mpg~.-name+I(acceleration^2),data=Auto))

## 
## Call:
## lm(formula = mpg ~ . - name + I(acceleration^2), data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9680 -1.9266 -0.0124  1.9153 13.2722 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.1088174  6.4930423   0.787   0.4319    
## cylinders         -0.3181584  0.3165577  -1.005   0.3155    
## displacement       0.0090446  0.0076528   1.182   0.2380    
## horsepower        -0.0346411  0.0139094  -2.490   0.0132 *  
## weight            -0.0054113  0.0006719  -8.053 1.03e-14 ***
## acceleration      -2.6374431  0.5758788  -4.580 6.30e-06 ***
## year               0.7535781  0.0495815  15.199  < 2e-16 ***
## origin             1.3265929  0.2713219   4.889 1.49e-06 ***
## I(acceleration^2)  0.0790472  0.0165131   4.787 2.42e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.237 on 383 degrees of freedom
## Multiple R-squared:  0.8316, Adjusted R-squared:  0.828 
## F-statistic: 236.3 on 8 and 383 DF,  p-value: < 2.2e-16

By squaring the acceleration variable we can see that it is slightly more significant than the original acceleration variable input.

summary(lm(mpg~.-name+sqrt(cylinders),data=Auto))

## 
## Call:
## lm(formula = mpg ~ . - name + sqrt(cylinders), data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.7190  -2.1361  -0.1756   1.7299  12.9229 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3.281e+01  1.453e+01   2.258 0.024490 *  
## cylinders        8.550e+00  2.513e+00   3.402 0.000739 ***
## displacement     2.001e-02  7.399e-03   2.704 0.007149 ** 
## horsepower      -2.867e-02  1.395e-02  -2.055 0.040585 *  
## weight          -6.365e-03  6.427e-04  -9.905  < 2e-16 ***
## acceleration     1.062e-01  9.757e-02   1.088 0.277224    
## year             7.474e-01  5.019e-02  14.891  < 2e-16 ***
## origin           1.255e+00  2.779e-01   4.514 8.46e-06 ***
## sqrt(cylinders) -4.261e+01  1.175e+01  -3.628 0.000325 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.276 on 383 degrees of freedom
## Multiple R-squared:  0.8274, Adjusted R-squared:  0.8238 
## F-statistic: 229.5 on 8 and 383 DF,  p-value: < 2.2e-16

Taking the square root of the cylinders variable appears to be more significant than the original cylinders variable input.

Problem 10.

This question should be answered using the Carseats data set.

library(ISLR)
attach(Carseats)

(a) Fit a multiple regression model to predict Sales using Price, Urban, and US.

fit<-lm(Sales~Price+Urban+US)
summary(fit)

## 
## Call:
## lm(formula = Sales ~ Price + Urban + US)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

(b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!
From the table above, Price and US are significant predictors of Sales. For every $1,000.00 increase in price, sales decrease by $54. Sales inside the US are $1,200.00 higher than sales outside of the US. Urban has no effect on Sales.

(c) Write out the model in equation form, being careful to handle the qualitative variables properly. $Sales=13.043469 - 0.054459Price - 0.021916Urban_{Yes} + 1.200573US_{Yes}$

(d) For which of the predictors can you reject the null hypothesis $H_0 : \beta_j = 0$?
Given p-values of less than .05 you are able to reject the null hypothesis for both of the predictors Price and US.

(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

fit<-lm(Sales~Price+US)
summary(fit)

## 
## Call:
## lm(formula = Sales ~ Price + US)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

(f) How well do the models in (a) and (e) fit the data?
The fit is not adequate for either model given the multiple R-squared value of 0.2393. This indicates that each model only explains 23.93% of the variance in Sales.

(g) Using the model from (e), obtain 95 % confidence intervals for the coefficient(s).

confint(fit)

##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632

(h) Is there evidence of outliers or high leverage observations in the model from (e)?
Included below are several plots which are used to subjectively demonstrate the existence of outliers or high leverage observations in our model. In the Residuals vs. Fitted plot, we can see that the residuals all follow the same random pattern around the zero line. This would suggest that the data have a linear relationship. From the Residuals vs. Leverage plot that was produced we can see that there are a couple of points that that appear to have a higher leverage, but none that are beyond the Cook’s distance lines. One example for an upper limit of acceptable influence on an observation would be average leverage. The average leverage for this model is calculated as $\frac{(2+1)}{400} = 0.0075$

par(mfrow=c(2,2))
plot(fit)

I have summarized points below that are identified by R as violating one of the rules for an acceptable level of influence.

summary(influence.measures(fit))

## Potentially influential observations of
##   lm(formula = Sales ~ Price + US) :
## 
##     dfb.1_ dfb.Pric dfb.USYs dffit   cov.r   cook.d hat    
## 26   0.24  -0.18    -0.17     0.28_*  0.97_*  0.03   0.01  
## 29  -0.10   0.10    -0.10    -0.18    0.97_*  0.01   0.01  
## 43  -0.11   0.10     0.03    -0.11    1.05_*  0.00   0.04_*
## 50  -0.10   0.17    -0.17     0.26_*  0.98    0.02   0.01  
## 51  -0.05   0.05    -0.11    -0.18    0.95_*  0.01   0.00  
## 58  -0.05  -0.02     0.16    -0.20    0.97_*  0.01   0.01  
## 69  -0.09   0.10     0.09     0.19    0.96_*  0.01   0.01  
## 126 -0.07   0.06     0.03    -0.07    1.03_*  0.00   0.03_*
## 160  0.00   0.00     0.00     0.01    1.02_*  0.00   0.02  
## 166  0.21  -0.23    -0.04    -0.24    1.02    0.02   0.03_*
## 172  0.06  -0.07     0.02     0.08    1.03_*  0.00   0.02  
## 175  0.14  -0.19     0.09    -0.21    1.03_*  0.02   0.03_*
## 210 -0.14   0.15    -0.10    -0.22    0.97_*  0.02   0.01  
## 270 -0.03   0.05    -0.03     0.06    1.03_*  0.00   0.02  
## 298 -0.06   0.06    -0.09    -0.15    0.97_*  0.01   0.00  
## 314 -0.05   0.04     0.02    -0.05    1.03_*  0.00   0.02_*
## 353 -0.02   0.03     0.09     0.15    0.97_*  0.01   0.00  
## 357  0.02  -0.02     0.02    -0.03    1.03_*  0.00   0.02  
## 368  0.26  -0.23    -0.11     0.27_*  1.01    0.02   0.02_*
## 377  0.14  -0.15     0.12     0.24    0.95_*  0.02   0.01  
## 384  0.00   0.00     0.00     0.00    1.02_*  0.00   0.02  
## 387 -0.03   0.04    -0.03     0.05    1.02_*  0.00   0.02  
## 396 -0.05   0.05     0.08     0.14    0.98_*  0.01   0.00

We are able to further analyze these points by generating a report which demonstrates statistics for a regression with the outliers removed and comparing to a regression with all data included.

outlying.obs<-c(26,29,43,50,51,58,69,126,160,166,172,175,210,270,298,314,353,357,368,377,384,396)
Carseats.small<-Carseats[-outlying.obs,]
fit2<-lm(Sales~Price+US,data=Carseats.small)
summary(fit2)

## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats.small)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2772 -1.5953 -0.0449  1.5735  5.4274 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.902800   0.662642  19.472  < 2e-16 ***
## Price       -0.053710   0.005473  -9.813  < 2e-16 ***
## USYes        1.246667   0.247884   5.029 7.65e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.288 on 375 degrees of freedom
## Multiple R-squared:  0.2397, Adjusted R-squared:  0.2356 
## F-statistic: 59.11 on 2 and 375 DF,  p-value: < 2.2e-16

We can see from the report generated that removing these outliers does not result in a significant change on the fit of the linear model to the full data set. By comparing the confidence intervals, we can also see that the confidence interval for the coefficient estimates produced by the linear model fit to the data set with outliers removed is contained by the confidence interval produced from the linear model fit to the full data set. It is safe to conclude that there are no outliers which should be excluded from the data points in our model.

Problem 12.

This problem involves simple linear regression without an intercept.

(a) Recall that the coefficient estimate βˆ for the linear regression of Y onto X without an intercept is given by (3.38). Under what circumstance is the coefficient estimate for the regression of X onto Y the same as the coefficient estimate for the regression of Y onto X? The coefficient estimate for the regression of $Y$ onto $X$ is \[\hat{\beta} = \frac{\sum_ix_iy_i}{\sum_jx_j^2};\] The coefficient estimate for the regression of $X$ onto $Y$ is \[\hat{\beta}' = \frac{\sum_ix_iy_i}{\sum_jy_j^2}.\] The coefficients are the same if $\sum_jx_j^2 = \sum_jy_j^2$.

When the variance of X is equal to the variance of Ythe coefficient estimate for the regression of X onto Y would be the same as the coefficient estimate for the regression of Y onto X.

(b) Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is different from the coefficient estimate for the regression of Y onto X.

x=rnorm(100)
y=0.5*x+rnorm(100)
coefficients(lm(x~y+0))

##         y 
## 0.3854639

coefficients(lm(y~x+0))

##         x 
## 0.5761462

(c) Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is the same as the coefficient estimate for the regression of Y onto X.

x=1:100
y=100:1
eg3<-lm(y~x+0)
summary(eg3)

## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49.75 -12.44  24.87  62.18  99.49 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   0.5075     0.0866    5.86 6.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.37 on 99 degrees of freedom
## Multiple R-squared:  0.2575, Adjusted R-squared:   0.25 
## F-statistic: 34.34 on 1 and 99 DF,  p-value: 6.094e-08

eg4<-lm(x~y+0)
summary(eg4)

## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49.75 -12.44  24.87  62.18  99.49 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y   0.5075     0.0866    5.86 6.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.37 on 99 degrees of freedom
## Multiple R-squared:  0.2575, Adjusted R-squared:   0.25 
## F-statistic: 34.34 on 1 and 99 DF,  p-value: 6.094e-08

Assignment 2

Danny Luke

2022-09-14

Problem 2.

Problem 9.

Problem 10.

Problem 12.