Question 2

KNN regression predicts a continuous numerical value. A KNN classifier predicts a categorical class label.

Question 9

a

library(ISLR2)

pairs(Auto[, -9], main = "Scatter Plot Matrix of Auto Dataset")

b

cor(Auto[, -9])
                    mpg  cylinders displacement horsepower     weight acceleration       year
mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442    0.4233285  0.5805410
cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273   -0.5046834 -0.3456474
displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944   -0.5438005 -0.3698552
horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377   -0.6891955 -0.4163615
weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000   -0.4168392 -0.3091199
acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392    1.0000000  0.2903161
year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199    0.2903161  1.0000000
origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054    0.2127458  0.1815277
                 origin
mpg           0.5652088
cylinders    -0.5689316
displacement -0.6145351
horsepower   -0.4551715
weight       -0.5850054
acceleration  0.2127458
year          0.1815277
origin        1.0000000

c

lr <- lm(mpg ~ . - name, data = Auto)
summary(lr)

Call:
lm(formula = mpg ~ . - name, data = Auto)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.5903 -2.1565 -0.1169  1.8690 13.0604 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
cylinders     -0.493376   0.323282  -1.526  0.12780    
displacement   0.019896   0.007515   2.647  0.00844 ** 
horsepower    -0.016951   0.013787  -1.230  0.21963    
weight        -0.006474   0.000652  -9.929  < 2e-16 ***
acceleration   0.080576   0.098845   0.815  0.41548    
year           0.750773   0.050973  14.729  < 2e-16 ***
origin         1.426141   0.278136   5.127 4.67e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.328 on 384 degrees of freedom
Multiple R-squared:  0.8215,    Adjusted R-squared:  0.8182 
F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16
  • Yes

  • Displacement, Weight, Year, and Origin

  • For every one year increase in the model year, the vehicle’s fuel efficiency increases by 0.75 MPG.

d

plot(lr)

The diagnostic plots show a distinct U shaped residual pattern that suggests that there is non-linearity as well as some outliers and one extreme leverage point.

e

lr2 <- lm(mpg ~ (. - name)^2, data = Auto)
summary(lr2)

Call:
lm(formula = mpg ~ (. - name)^2, data = Auto)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.6303 -1.4481  0.0596  1.2739 11.1386 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)   
(Intercept)                3.548e+01  5.314e+01   0.668  0.50475   
cylinders                  6.989e+00  8.248e+00   0.847  0.39738   
displacement              -4.785e-01  1.894e-01  -2.527  0.01192 * 
horsepower                 5.034e-01  3.470e-01   1.451  0.14769   
weight                     4.133e-03  1.759e-02   0.235  0.81442   
acceleration              -5.859e+00  2.174e+00  -2.696  0.00735 **
year                       6.974e-01  6.097e-01   1.144  0.25340   
origin                    -2.090e+01  7.097e+00  -2.944  0.00345 **
cylinders:displacement    -3.383e-03  6.455e-03  -0.524  0.60051   
cylinders:horsepower       1.161e-02  2.420e-02   0.480  0.63157   
cylinders:weight           3.575e-04  8.955e-04   0.399  0.69000   
cylinders:acceleration     2.779e-01  1.664e-01   1.670  0.09584 . 
cylinders:year            -1.741e-01  9.714e-02  -1.793  0.07389 . 
cylinders:origin           4.022e-01  4.926e-01   0.816  0.41482   
displacement:horsepower   -8.491e-05  2.885e-04  -0.294  0.76867   
displacement:weight        2.472e-05  1.470e-05   1.682  0.09342 . 
displacement:acceleration -3.479e-03  3.342e-03  -1.041  0.29853   
displacement:year          5.934e-03  2.391e-03   2.482  0.01352 * 
displacement:origin        2.398e-02  1.947e-02   1.232  0.21875   
horsepower:weight         -1.968e-05  2.924e-05  -0.673  0.50124   
horsepower:acceleration   -7.213e-03  3.719e-03  -1.939  0.05325 . 
horsepower:year           -5.838e-03  3.938e-03  -1.482  0.13916   
horsepower:origin          2.233e-03  2.930e-02   0.076  0.93931   
weight:acceleration        2.346e-04  2.289e-04   1.025  0.30596   
weight:year               -2.245e-04  2.127e-04  -1.056  0.29182   
weight:origin             -5.789e-04  1.591e-03  -0.364  0.71623   
acceleration:year          5.562e-02  2.558e-02   2.174  0.03033 * 
acceleration:origin        4.583e-01  1.567e-01   2.926  0.00365 **
year:origin                1.393e-01  7.399e-02   1.882  0.06062 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.695 on 363 degrees of freedom
Multiple R-squared:  0.8893,    Adjusted R-squared:  0.8808 
F-statistic: 104.2 on 28 and 363 DF,  p-value: < 2.2e-16
The below interactions appear to be significant
displacement:year 
acceleration:year
acceleration:origin

f

lr_trans <- lm(mpg ~ . - name + log(horsepower) + I(horsepower^2) + log(weight) + I(displacement^2), data = Auto)
summary(lr_trans)

Call:
lm(formula = mpg ~ . - name + log(horsepower) + I(horsepower^2) + 
    log(weight) + I(displacement^2), data = Auto)

Residuals:
   Min     1Q Median     3Q    Max 
-9.273 -1.497 -0.110  1.446 11.974 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        1.415e+02  4.757e+01   2.976  0.00311 ** 
cylinders          1.732e-01  3.648e-01   0.475  0.63521    
displacement      -3.681e-02  1.994e-02  -1.846  0.06564 .  
horsepower         4.667e-02  1.714e-01   0.272  0.78556    
weight             1.078e-03  2.115e-03   0.510  0.61049    
acceleration      -2.018e-01  1.005e-01  -2.008  0.04533 *  
year               7.657e-01  4.514e-02  16.963  < 2e-16 ***
origin             5.465e-01  2.670e-01   2.046  0.04140 *  
log(horsepower)   -1.375e+01  9.530e+00  -1.442  0.15002    
I(horsepower^2)    6.682e-05  3.507e-04   0.191  0.84901    
log(weight)       -1.469e+01  6.803e+00  -2.159  0.03145 *  
I(displacement^2)  6.712e-05  3.436e-05   1.954  0.05148 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.905 on 380 degrees of freedom
Multiple R-squared:  0.8654,    Adjusted R-squared:  0.8615 
F-statistic: 222.1 on 11 and 380 DF,  p-value: < 2.2e-16

Transforming the variables eliminates the non linear patter and improves the model accuracy

Question 10

a

lr <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(lr)

Call:
lm(formula = Sales ~ Price + Urban + US, data = Carseats)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.9206 -1.6220 -0.0564  1.5786  7.0581 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
Price       -0.054459   0.005242 -10.389  < 2e-16 ***
UrbanYes    -0.021916   0.271650  -0.081    0.936    
USYes        1.200573   0.259042   4.635 4.86e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.472 on 396 degrees of freedom
Multiple R-squared:  0.2393,    Adjusted R-squared:  0.2335 
F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

b

  • Price - for every $1 increase in price, sales decreases by 0.05

  • UrbanYes - These results are not statistically significant.

  • USYes - Stores in the US are expected to see higher sales on average compared to to those outside of the US

c

$$Sales = \beta_0 + \beta_1(Price) + \beta_2(UrbanYes) + \beta_3(USYes) + \epsilon$$

d

We can reject the null hypothesis for Price and USYes

e

lr <- lm(Sales ~ Price + US, data = Carseats)
summary(lr)

Call:
lm(formula = Sales ~ Price + US, data = Carseats)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.9269 -1.6286 -0.0574  1.5766  7.0515 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
Price       -0.05448    0.00523 -10.416  < 2e-16 ***
USYes        1.19964    0.25846   4.641 4.71e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.469 on 397 degrees of freedom
Multiple R-squared:  0.2393,    Adjusted R-squared:  0.2354 
F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

f

The second model fits the data slightly better than the first model

g

confint(lr)
                  2.5 %      97.5 %
(Intercept) 11.79032020 14.27126531
Price       -0.06475984 -0.04419543
USYes        0.69151957  1.70776632

h

Yes

Question 12

a

When the sum of squares of the ‭X values equals the sum of squares of the ‭Y‬ values

b

set.seed(42)
x <- rnorm(100, mean = 0, sd = 1)
y <- 2 * x + rnorm(100, mean = 0, sd = 2)

x_fit <- lm(y ~ x - 1)
y_fit <- lm(x ~ y - 1)

coef(x_fit)
       x 
2.048971 
coef(y_fit)
        y 
0.2831267 

c

set.seed(42)
x <- rnorm(100, mean = 0, sd = 1)
y <- sample(x)

x_fit <- lm(y ~ x - 1)
y_fit <- lm(x ~ y - 1)

coef(x_fit)
          x 
-0.01997296 
coef(y_fit)
          y 
-0.01997296 
LS0tCnRpdGxlOiAiQ2hyaXMgU2VycmFubyAtIEFzc2lnbm1lbnQgMiIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHRydWUKICAgIHRvY19mbG9hdDogdHJ1ZQotLS0KCiMjIFF1ZXN0aW9uIDIKCktOTiByZWdyZXNzaW9uIHByZWRpY3RzIGEgY29udGludW91cyBudW1lcmljYWwgdmFsdWUuIEEgS05OIGNsYXNzaWZpZXIgcHJlZGljdHMgYSBjYXRlZ29yaWNhbCBjbGFzcyBsYWJlbC4KCiMjIFF1ZXN0aW9uIDkKCiMjIyBhCgpgYGB7cn0KbGlicmFyeShJU0xSMikKCnBhaXJzKEF1dG9bLCAtOV0sIG1haW4gPSAiU2NhdHRlciBQbG90IE1hdHJpeCBvZiBBdXRvIERhdGFzZXQiKQoKYGBgCgojIyMgYgoKYGBge3J9CmNvcihBdXRvWywgLTldKQpgYGAKCiMjIyBjCgpgYGB7cn0KbHIgPC0gbG0obXBnIH4gLiAtIG5hbWUsIGRhdGEgPSBBdXRvKQpzdW1tYXJ5KGxyKQpgYGAKCi0gICBZZXMKCi0gICBEaXNwbGFjZW1lbnQsIFdlaWdodCwgWWVhciwgYW5kIE9yaWdpbgoKLSAgIEZvciBldmVyeSBvbmUgeWVhciBpbmNyZWFzZSBpbiB0aGUgbW9kZWwgeWVhciwgdGhlIHZlaGljbGUncyBmdWVsIGVmZmljaWVuY3kgaW5jcmVhc2VzIGJ5IDAuNzUgTVBHLgoKIyMjIGQKCmBgYHtyfQpwbG90KGxyKQpgYGAKClRoZSBkaWFnbm9zdGljIHBsb3RzIHNob3cgYSBkaXN0aW5jdCBVIHNoYXBlZCByZXNpZHVhbCBwYXR0ZXJuIHRoYXQgc3VnZ2VzdHMgdGhhdCB0aGVyZSBpcyBub24tbGluZWFyaXR5IGFzIHdlbGwgYXMgc29tZSBvdXRsaWVycyBhbmQgb25lIGV4dHJlbWUgbGV2ZXJhZ2UgcG9pbnQuCgojIyMgZQoKYGBge3J9CmxyMiA8LSBsbShtcGcgfiAoLiAtIG5hbWUpXjIsIGRhdGEgPSBBdXRvKQpzdW1tYXJ5KGxyMikKYGBgCgpgYGAgICAgICAgICAKVGhlIGJlbG93IGludGVyYWN0aW9ucyBhcHBlYXIgdG8gYmUgc2lnbmlmaWNhbnQKZGlzcGxhY2VtZW50OnllYXIgCmFjY2VsZXJhdGlvbjp5ZWFyCmFjY2VsZXJhdGlvbjpvcmlnaW4KYGBgCgojIyMgZgoKYGBge3J9CmxyX3RyYW5zIDwtIGxtKG1wZyB+IC4gLSBuYW1lICsgbG9nKGhvcnNlcG93ZXIpICsgSShob3JzZXBvd2VyXjIpICsgbG9nKHdlaWdodCkgKyBJKGRpc3BsYWNlbWVudF4yKSwgZGF0YSA9IEF1dG8pCnN1bW1hcnkobHJfdHJhbnMpCmBgYAoKVHJhbnNmb3JtaW5nIHRoZSB2YXJpYWJsZXMgZWxpbWluYXRlcyB0aGUgbm9uIGxpbmVhciBwYXR0ZXIgYW5kIGltcHJvdmVzIHRoZSBtb2RlbCBhY2N1cmFjeQoKIyMgUXVlc3Rpb24gMTAKCiMjIyBhCgpgYGB7cn0KbHIgPC0gbG0oU2FsZXMgfiBQcmljZSArIFVyYmFuICsgVVMsIGRhdGEgPSBDYXJzZWF0cykKc3VtbWFyeShscikKYGBgCgojIyMgYgoKLSAgIFByaWNlIC0gZm9yIGV2ZXJ5IFwkMSBpbmNyZWFzZSBpbiBwcmljZSwgc2FsZXMgZGVjcmVhc2VzIGJ5IDAuMDUKCi0gICBVcmJhblllcyAtIFRoZXNlIHJlc3VsdHMgYXJlIG5vdCBzdGF0aXN0aWNhbGx5IHNpZ25pZmljYW50LgoKLSAgIFVTWWVzIC0gU3RvcmVzIGluIHRoZSBVUyBhcmUgZXhwZWN0ZWQgdG8gc2VlIGhpZ2hlciBzYWxlcyBvbiBhdmVyYWdlIGNvbXBhcmVkIHRvIHRvIHRob3NlIG91dHNpZGUgb2YgdGhlIFVTCgojIyMgYwoKXCRcJFNhbGVzID0gXFxiZXRhXzAgKyBcXGJldGFfMShQcmljZSkgKyBcXGJldGFfMihVcmJhblllcykgKyBcXGJldGFfMyhVU1llcykgKyBcXGVwc2lsb25cJFwkCgojIyMgZAoKV2UgY2FuIHJlamVjdCB0aGUgbnVsbCBoeXBvdGhlc2lzIGZvciBQcmljZSBhbmQgVVNZZXMKCiMjIyBlCgpgYGB7cn0KbHIgPC0gbG0oU2FsZXMgfiBQcmljZSArIFVTLCBkYXRhID0gQ2Fyc2VhdHMpCnN1bW1hcnkobHIpCmBgYAoKIyMjIGYKClRoZSBzZWNvbmQgbW9kZWwgZml0cyB0aGUgZGF0YSBzbGlnaHRseSBiZXR0ZXIgdGhhbiB0aGUgZmlyc3QgbW9kZWwKCiMjIyAKZwoKYGBge3J9CmNvbmZpbnQobHIpCmBgYAoKIyMjIApoCgpZZXMKCiMjIFF1ZXN0aW9uIDEyCgojIyMgYQoKV2hlbiB0aGUgc3VtIG9mIHNxdWFyZXMgb2YgdGhlIOKArVggdmFsdWVzIGVxdWFscyB0aGUgc3VtIG9mIHNxdWFyZXMgb2YgdGhlIOKArVnigKwgdmFsdWVzCgojIyMgYgoKYGBge3J9CnNldC5zZWVkKDQyKQp4IDwtIHJub3JtKDEwMCwgbWVhbiA9IDAsIHNkID0gMSkKeSA8LSAyICogeCArIHJub3JtKDEwMCwgbWVhbiA9IDAsIHNkID0gMikKCnhfZml0IDwtIGxtKHkgfiB4IC0gMSkKeV9maXQgPC0gbG0oeCB+IHkgLSAxKQoKY29lZih4X2ZpdCkKCmBgYAoKYGBge3J9CmNvZWYoeV9maXQpCmBgYAoKIyMjIGMKCmBgYHtyfQpzZXQuc2VlZCg0MikKeCA8LSBybm9ybSgxMDAsIG1lYW4gPSAwLCBzZCA9IDEpCnkgPC0gc2FtcGxlKHgpCgp4X2ZpdCA8LSBsbSh5IH4geCAtIDEpCnlfZml0IDwtIGxtKHggfiB5IC0gMSkKCmNvZWYoeF9maXQpCgoKYGBgCgpgYGB7cn0KY29lZih5X2ZpdCkKYGBgCg==