# Loading in dependencies
pacman::p_load(ISLR2)

Question 2. Carefully explain the diferences between the KNN classifer and KNN regression methods.

The K-Nearest Neighbors (KNN) classifier is used for classification tasks, where the goal is to predict a discrete class/label. Meanwhile, KNN regression method is used for regression tasks, where the goal is to predict a continuous value.

In KNN Classifier if k = 3 and the nearest neighbors, the data points in the training set that are most similar or closest to the point being predicted, are [A, B, A], then the predicted class would be A with A appearing twice. In KNN Regression if k = 3 and the nearest neighbors have values [5, 7, 6], then the predicted value would be (5 + 7 + 6) / 3 = 6.

Question 9.

# Reading in data
auto = read.csv('Auto.csv', na.strings= '?', stringsAsFactors= T)

Part a)

pairs(auto)

Part b)

cor(Filter(is.numeric, na.omit(auto)))
##                     mpg  cylinders displacement horsepower     weight
## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
## origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054
##              acceleration       year     origin
## mpg             0.4233285  0.5805410  0.5652088
## cylinders      -0.5046834 -0.3456474 -0.5689316
## displacement   -0.5438005 -0.3698552 -0.6145351
## horsepower     -0.6891955 -0.4163615 -0.4551715
## weight         -0.4168392 -0.3091199 -0.5850054
## acceleration    1.0000000  0.2903161  0.2127458
## year            0.2903161  1.0000000  0.1815277
## origin          0.2127458  0.1815277  1.0000000

Part c)

  1. Is there a relationship between the predictors and the response? A relationship exists between the predictors and response since some predictors have a p-value less than 0.05, indicating statistical significance. Additionally, the model’s adjusted R^2 is 0.8182 or 81.82% of the variance is explained by the model.

  2. Which predictors appear to have a statistically significant relationship to the response? There are 4 which are displacement, weight, year, and origin with p-values of 0.00844, 2E-16, 2E-16, and 4.67E-7 respectively.

  3. What does the coefficient for the year variable suggest? It suggests for an increase of 1 year, we expect mpg to increase by 0.75.

linear_model = lm(mpg ~ . -name, data=auto)
summary(linear_model)
## 
## Call:
## lm(formula = mpg ~ . - name, data = auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

Part d) Comment on any problems you see with the ft. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?

Based on the Residuals vs Leverage plot, some points are far from the center of the data (red line) which indicates there might be some influential points in the data. The plot does identify 3 observations with high leverage which are observations # 327, 394, and 14.

par (mfrow = c(2 ,2))
plot(linear_model)

Part e) Use the * and : symbols to ft linear regression models with interaction efects. Do any interactions appear to be statistically signifcant?

Between all the interactions, 14 statistically significant with a significance level of 0.05. These are weight:acceleration, cylinders:weight:acceleration, displacement:weight:acceleration, horsepower:weight:acceleration, weight:acceleration:year, horsepower:acceleration:origin, horsepower:year:origin, displacement:horsepower:weight:acceleration, cylinders:horsepower:acceleration:year, displacement:weight:acceleration:year, displacement:weight:acceleration:origin, horsepower:weight:acceleration:origin, displacement:acceleration:year:origin, and displacement:horsepower:weight:acceleration:origin.

stepwise_model = step(lm(mpg ~ cylinders * displacement * horsepower * weight * acceleration * year * origin, data=auto), direction= 'both', trace=0)

summary(stepwise_model)
## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight + 
##     acceleration + year + origin + cylinders:displacement + cylinders:horsepower + 
##     displacement:horsepower + cylinders:weight + displacement:weight + 
##     horsepower:weight + cylinders:acceleration + displacement:acceleration + 
##     horsepower:acceleration + weight:acceleration + cylinders:year + 
##     displacement:year + horsepower:year + weight:year + acceleration:year + 
##     cylinders:origin + displacement:origin + horsepower:origin + 
##     weight:origin + acceleration:origin + year:origin + cylinders:displacement:horsepower + 
##     cylinders:displacement:weight + cylinders:horsepower:weight + 
##     displacement:horsepower:weight + cylinders:displacement:acceleration + 
##     cylinders:horsepower:acceleration + displacement:horsepower:acceleration + 
##     cylinders:weight:acceleration + displacement:weight:acceleration + 
##     horsepower:weight:acceleration + cylinders:displacement:year + 
##     cylinders:horsepower:year + displacement:horsepower:year + 
##     cylinders:weight:year + displacement:weight:year + horsepower:weight:year + 
##     cylinders:acceleration:year + displacement:acceleration:year + 
##     horsepower:acceleration:year + weight:acceleration:year + 
##     cylinders:displacement:origin + cylinders:horsepower:origin + 
##     displacement:horsepower:origin + cylinders:weight:origin + 
##     displacement:weight:origin + horsepower:weight:origin + cylinders:acceleration:origin + 
##     displacement:acceleration:origin + horsepower:acceleration:origin + 
##     weight:acceleration:origin + cylinders:year:origin + displacement:year:origin + 
##     horsepower:year:origin + weight:year:origin + acceleration:year:origin + 
##     cylinders:displacement:horsepower:weight + cylinders:displacement:horsepower:acceleration + 
##     cylinders:displacement:weight:acceleration + cylinders:horsepower:weight:acceleration + 
##     displacement:horsepower:weight:acceleration + cylinders:displacement:horsepower:year + 
##     cylinders:displacement:weight:year + cylinders:horsepower:weight:year + 
##     displacement:horsepower:weight:year + cylinders:displacement:acceleration:year + 
##     cylinders:horsepower:acceleration:year + displacement:horsepower:acceleration:year + 
##     displacement:weight:acceleration:year + horsepower:weight:acceleration:year + 
##     cylinders:displacement:horsepower:origin + cylinders:displacement:weight:origin + 
##     cylinders:horsepower:weight:origin + displacement:horsepower:weight:origin + 
##     cylinders:displacement:acceleration:origin + displacement:horsepower:acceleration:origin + 
##     cylinders:weight:acceleration:origin + displacement:weight:acceleration:origin + 
##     horsepower:weight:acceleration:origin + cylinders:displacement:year:origin + 
##     cylinders:horsepower:year:origin + cylinders:weight:year:origin + 
##     horsepower:weight:year:origin + cylinders:acceleration:year:origin + 
##     displacement:acceleration:year:origin + cylinders:displacement:horsepower:weight:acceleration + 
##     cylinders:displacement:horsepower:weight:year + cylinders:displacement:horsepower:acceleration:year + 
##     displacement:horsepower:weight:acceleration:year + cylinders:displacement:horsepower:weight:origin + 
##     displacement:horsepower:weight:acceleration:origin, data = auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6216 -1.0616 -0.0128  0.9286  9.2300 
## 
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                            3.398e+04  2.742e+04
## cylinders                                             -1.053e+04  6.739e+03
## displacement                                          -9.496e+02  6.373e+02
## horsepower                                            -8.256e+02  5.865e+02
## weight                                                 3.993e+01  2.470e+01
## acceleration                                          -5.148e+03  3.668e+03
## year                                                   7.021e+02  3.951e+02
## origin                                                -3.909e+04  2.641e+04
## cylinders:displacement                                 2.479e+02  1.590e+02
## cylinders:horsepower                                   2.234e+02  1.463e+02
## displacement:horsepower                                5.765e+00  3.828e+00
## cylinders:weight                                      -9.089e+00  6.186e+00
## displacement:weight                                    1.188e-01  9.094e-02
## horsepower:weight                                      2.286e-01  1.792e-01
## cylinders:acceleration                                 1.375e+03  9.175e+02
## displacement:acceleration                              5.857e+00  4.567e+00
## horsepower:acceleration                               -2.529e-01  1.718e+00
## weight:acceleration                                   -2.081e-01  6.665e-02
## cylinders:year                                        -1.539e+02  9.982e+01
## displacement:year                                      4.605e+00  3.350e+00
## horsepower:year                                       -9.157e-01  6.448e-01
## weight:year                                           -7.774e-01  4.972e-01
## acceleration:year                                      5.603e+01  4.002e+01
## cylinders:origin                                       1.037e+04  6.633e+03
## displacement:origin                                    9.782e+02  6.328e+02
## horsepower:origin                                      8.746e+02  5.799e+02
## weight:origin                                         -3.593e+01  2.491e+01
## acceleration:origin                                    5.123e+03  3.656e+03
## year:origin                                           -6.564e+02  4.006e+02
## cylinders:displacement:horsepower                     -1.518e+00  9.565e-01
## cylinders:displacement:weight                         -3.340e-02  2.261e-02
## cylinders:horsepower:weight                           -6.587e-02  4.462e-02
## displacement:horsepower:weight                        -1.598e-03  1.144e-03
## cylinders:displacement:acceleration                   -2.054e+00  1.125e+00
## cylinders:horsepower:acceleration                     -6.146e-01  3.540e-01
## displacement:horsepower:acceleration                   8.598e-03  1.261e-02
## cylinders:weight:acceleration                          1.527e-02  7.111e-03
## displacement:weight:acceleration                       8.317e-04  2.847e-04
## horsepower:weight:acceleration                         1.657e-03  7.240e-04
## cylinders:displacement:year                           -1.257e+00  8.279e-01
## cylinders:horsepower:year                              7.418e-02  1.002e-01
## displacement:horsepower:year                           2.044e-03  3.333e-03
## cylinders:weight:year                                  1.850e-01  1.249e-01
## displacement:weight:year                               1.953e-04  1.012e-04
## horsepower:weight:year                                 4.711e-04  2.620e-04
## cylinders:acceleration:year                           -1.486e+01  9.970e+00
## displacement:acceleration:year                         8.143e-03  1.521e-02
## horsepower:acceleration:year                          -2.020e-02  1.858e-02
## weight:acceleration:year                               1.354e-03  6.612e-04
## cylinders:displacement:origin                         -2.481e+02  1.586e+02
## cylinders:horsepower:origin                           -2.237e+02  1.455e+02
## displacement:horsepower:origin                        -5.994e+00  3.785e+00
## cylinders:weight:origin                                8.770e+00  6.215e+00
## displacement:weight:origin                            -1.365e-01  8.911e-02
## horsepower:weight:origin                              -2.684e-01  1.764e-01
## cylinders:acceleration:origin                         -1.306e+03  9.171e+02
## displacement:acceleration:origin                      -6.365e+00  4.359e+00
## horsepower:acceleration:origin                         7.977e-01  3.847e-01
## weight:acceleration:origin                             6.408e-02  3.311e-02
## cylinders:year:origin                                  1.593e+02  1.004e+02
## displacement:year:origin                              -4.888e+00  3.284e+00
## horsepower:year:origin                                 5.032e-01  2.315e-01
## weight:year:origin                                     7.339e-01  5.007e-01
## acceleration:year:origin                              -5.441e+01  3.984e+01
## cylinders:displacement:horsepower:weight               4.356e-04  2.864e-04
## cylinders:displacement:horsepower:acceleration         1.973e-03  1.773e-03
## cylinders:displacement:weight:acceleration            -2.318e-05  1.303e-05
## cylinders:horsepower:weight:acceleration              -5.812e-05  3.006e-05
## displacement:horsepower:weight:acceleration           -8.167e-06  2.890e-06
## cylinders:displacement:horsepower:year                 6.352e-05  4.238e-04
## cylinders:displacement:weight:year                    -1.416e-05  1.086e-05
## cylinders:horsepower:weight:year                      -3.530e-05  2.493e-05
## displacement:horsepower:weight:year                   -1.685e-06  1.035e-06
## cylinders:displacement:acceleration:year               3.905e-03  2.407e-03
## cylinders:horsepower:acceleration:year                 9.913e-03  4.724e-03
## displacement:horsepower:acceleration:year              1.674e-05  1.558e-04
## displacement:weight:acceleration:year                 -6.470e-06  3.001e-06
## horsepower:weight:acceleration:year                   -1.109e-05  7.578e-06
## cylinders:displacement:horsepower:origin               1.523e+00  9.518e-01
## cylinders:displacement:weight:origin                   3.487e-02  2.244e-02
## cylinders:horsepower:weight:origin                     6.957e-02  4.436e-02
## displacement:horsepower:weight:origin                  1.758e-03  1.129e-03
## cylinders:displacement:acceleration:origin             1.818e+00  1.112e+00
## displacement:horsepower:acceleration:origin           -6.105e-03  3.195e-03
## cylinders:weight:acceleration:origin                  -9.371e-03  6.239e-03
## displacement:weight:acceleration:origin               -1.793e-04  8.734e-05
## horsepower:weight:acceleration:origin                 -3.753e-04  1.689e-04
## cylinders:displacement:year:origin                     1.247e+00  8.219e-01
## cylinders:horsepower:year:origin                      -1.023e-01  5.713e-02
## cylinders:weight:year:origin                          -1.821e-01  1.253e-01
## horsepower:weight:year:origin                         -4.899e-05  2.984e-05
## cylinders:acceleration:year:origin                     1.374e+01  9.962e+00
## displacement:acceleration:year:origin                 -6.591e-03  2.343e-03
## cylinders:displacement:horsepower:weight:acceleration  2.296e-07  1.342e-07
## cylinders:displacement:horsepower:weight:year          1.407e-07  1.137e-07
## cylinders:displacement:horsepower:acceleration:year   -3.300e-05  2.414e-05
## displacement:horsepower:weight:acceleration:year       4.934e-08  2.531e-08
## cylinders:displacement:horsepower:weight:origin       -4.505e-04  2.849e-04
## displacement:horsepower:weight:acceleration:origin     2.700e-06  1.290e-06
##                                                       t value Pr(>|t|)   
## (Intercept)                                             1.239  0.21621   
## cylinders                                              -1.562  0.11930   
## displacement                                           -1.490  0.13729   
## horsepower                                             -1.408  0.16026   
## weight                                                  1.617  0.10705   
## acceleration                                           -1.404  0.16148   
## year                                                    1.777  0.07663 . 
## origin                                                 -1.480  0.13988   
## cylinders:displacement                                  1.559  0.12017   
## cylinders:horsepower                                    1.527  0.12776   
## displacement:horsepower                                 1.506  0.13309   
## cylinders:weight                                       -1.469  0.14285   
## displacement:weight                                     1.307  0.19230   
## horsepower:weight                                       1.276  0.20306   
## cylinders:acceleration                                  1.499  0.13502   
## displacement:acceleration                               1.282  0.20071   
## horsepower:acceleration                                -0.147  0.88305   
## weight:acceleration                                    -3.123  0.00197 **
## cylinders:year                                         -1.542  0.12415   
## displacement:year                                       1.375  0.17030   
## horsepower:year                                        -1.420  0.15663   
## weight:year                                            -1.564  0.11901   
## acceleration:year                                       1.400  0.16256   
## cylinders:origin                                        1.564  0.11896   
## displacement:origin                                     1.546  0.12326   
## horsepower:origin                                       1.508  0.13257   
## weight:origin                                          -1.442  0.15031   
## acceleration:origin                                     1.401  0.16216   
## year:origin                                            -1.639  0.10238   
## cylinders:displacement:horsepower                      -1.587  0.11351   
## cylinders:displacement:weight                          -1.477  0.14075   
## cylinders:horsepower:weight                            -1.476  0.14096   
## displacement:horsepower:weight                         -1.397  0.16335   
## cylinders:displacement:acceleration                    -1.826  0.06884 . 
## cylinders:horsepower:acceleration                      -1.736  0.08356 . 
## displacement:horsepower:acceleration                    0.682  0.49594   
## cylinders:weight:acceleration                           2.147  0.03263 * 
## displacement:weight:acceleration                        2.922  0.00375 **
## horsepower:weight:acceleration                          2.289  0.02277 * 
## cylinders:displacement:year                            -1.518  0.12998   
## cylinders:horsepower:year                               0.741  0.45957   
## displacement:horsepower:year                            0.613  0.54003   
## cylinders:weight:year                                   1.481  0.13956   
## displacement:weight:year                                1.930  0.05454 . 
## horsepower:weight:year                                  1.798  0.07319 . 
## cylinders:acceleration:year                            -1.490  0.13726   
## displacement:acceleration:year                          0.535  0.59288   
## horsepower:acceleration:year                           -1.087  0.27779   
## weight:acceleration:year                                2.048  0.04144 * 
## cylinders:displacement:origin                          -1.565  0.11878   
## cylinders:horsepower:origin                            -1.538  0.12514   
## displacement:horsepower:origin                         -1.583  0.11440   
## cylinders:weight:origin                                 1.411  0.15926   
## displacement:weight:origin                             -1.532  0.12666   
## horsepower:weight:origin                               -1.522  0.12921   
## cylinders:acceleration:origin                          -1.424  0.15561   
## displacement:acceleration:origin                       -1.460  0.14529   
## horsepower:acceleration:origin                          2.073  0.03901 * 
## weight:acceleration:origin                              1.935  0.05393 . 
## cylinders:year:origin                                   1.586  0.11372   
## displacement:year:origin                               -1.488  0.13774   
## horsepower:year:origin                                  2.173  0.03057 * 
## weight:year:origin                                      1.466  0.14376   
## acceleration:year:origin                               -1.366  0.17304   
## cylinders:displacement:horsepower:weight                1.521  0.12936   
## cylinders:displacement:horsepower:acceleration          1.113  0.26677   
## cylinders:displacement:weight:acceleration             -1.779  0.07634 . 
## cylinders:horsepower:weight:acceleration               -1.933  0.05418 . 
## displacement:horsepower:weight:acceleration            -2.826  0.00504 **
## cylinders:displacement:horsepower:year                  0.150  0.88096   
## cylinders:displacement:weight:year                     -1.304  0.19329   
## cylinders:horsepower:weight:year                       -1.416  0.15780   
## displacement:horsepower:weight:year                    -1.627  0.10475   
## cylinders:displacement:acceleration:year                1.622  0.10581   
## cylinders:horsepower:acceleration:year                  2.098  0.03672 * 
## displacement:horsepower:acceleration:year               0.107  0.91450   
## displacement:weight:acceleration:year                  -2.156  0.03190 * 
## horsepower:weight:acceleration:year                    -1.463  0.14444   
## cylinders:displacement:horsepower:origin                1.601  0.11056   
## cylinders:displacement:weight:origin                    1.554  0.12130   
## cylinders:horsepower:weight:origin                      1.568  0.11786   
## displacement:horsepower:weight:origin                   1.557  0.12050   
## cylinders:displacement:acceleration:origin              1.635  0.10303   
## displacement:horsepower:acceleration:origin            -1.911  0.05698 . 
## cylinders:weight:acceleration:origin                   -1.502  0.13420   
## displacement:weight:acceleration:origin                -2.053  0.04096 * 
## horsepower:weight:acceleration:origin                  -2.223  0.02700 * 
## cylinders:displacement:year:origin                      1.517  0.13037   
## cylinders:horsepower:year:origin                       -1.790  0.07441 . 
## cylinders:weight:year:origin                           -1.453  0.14716   
## horsepower:weight:year:origin                          -1.642  0.10176   
## cylinders:acceleration:year:origin                      1.380  0.16879   
## displacement:acceleration:year:origin                  -2.813  0.00524 **
## cylinders:displacement:horsepower:weight:acceleration   1.711  0.08805 . 
## cylinders:displacement:horsepower:weight:year           1.237  0.21693   
## cylinders:displacement:horsepower:acceleration:year    -1.367  0.17271   
## displacement:horsepower:weight:acceleration:year        1.950  0.05217 . 
## cylinders:displacement:horsepower:weight:origin        -1.581  0.11487   
## displacement:horsepower:weight:acceleration:origin      2.093  0.03717 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.371 on 293 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.9308, Adjusted R-squared:  0.9077 
## F-statistic: 40.23 on 98 and 293 DF,  p-value: < 2.2e-16

Part f) Try a few different transformations of the variables, such as log(X),√X, X^2. Comment on your findings.

Log Transformation: Reduces the effect of large values and can stabilize variance. I can see in the Residuals vs Fitted plot the points are closer together indicating the adjustment to create constant variance. Square Root Transformation: Can reduce the effect of extreme values which can be useful for data with a skewed distribution. The Q-Q plot looks closer to a straight line which showed the adjustment to be closer to a normal distribution. Square Transformation: If there is a quadratic relationship between the predictor and response then this model can be used. I noticed in the Scale-Location plot the points had a smaller spread of the residuals.

auto$log_mpg = log(auto$mpg)
auto$sqrt_mpg = sqrt(auto$mpg)
auto$sq_mpg = auto$mpg^2
log_model = lm(log_mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data=auto)

summary(log_model)
## 
## Call:
## lm(formula = log_mpg ~ cylinders + displacement + horsepower + 
##     weight + acceleration + year + origin, data = auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40955 -0.06533  0.00079  0.06785  0.33925 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.751e+00  1.662e-01  10.533  < 2e-16 ***
## cylinders    -2.795e-02  1.157e-02  -2.415  0.01619 *  
## displacement  6.362e-04  2.690e-04   2.365  0.01852 *  
## horsepower   -1.475e-03  4.935e-04  -2.989  0.00298 ** 
## weight       -2.551e-04  2.334e-05 -10.931  < 2e-16 ***
## acceleration -1.348e-03  3.538e-03  -0.381  0.70339    
## year          2.958e-02  1.824e-03  16.211  < 2e-16 ***
## origin        4.071e-02  9.955e-03   4.089 5.28e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1191 on 384 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8795, Adjusted R-squared:  0.8773 
## F-statistic: 400.4 on 7 and 384 DF,  p-value: < 2.2e-16
sqrt_model = lm(sqrt_mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data=auto)

summary(sqrt_model)
## 
## Call:
## lm(formula = sqrt_mpg ~ cylinders + displacement + horsepower + 
##     weight + acceleration + year + origin, data = auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.98891 -0.18946  0.00505  0.16947  1.02581 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.075e+00  4.290e-01   2.506   0.0126 *  
## cylinders    -5.942e-02  2.986e-02  -1.990   0.0474 *  
## displacement  1.752e-03  6.942e-04   2.524   0.0120 *  
## horsepower   -2.512e-03  1.274e-03  -1.972   0.0493 *  
## weight       -6.367e-04  6.024e-05 -10.570  < 2e-16 ***
## acceleration  2.738e-03  9.131e-03   0.300   0.7644    
## year          7.381e-02  4.709e-03  15.675  < 2e-16 ***
## origin        1.217e-01  2.569e-02   4.735 3.09e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3074 on 384 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.8561, Adjusted R-squared:  0.8535 
## F-statistic: 326.3 on 7 and 384 DF,  p-value: < 2.2e-16
squared_model = lm(sq_mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data=auto)

summary(squared_model)
## 
## Call:
## lm(formula = sq_mpg ~ cylinders + displacement + horsepower + 
##     weight + acceleration + year + origin, data = auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -483.45 -141.87  -19.62  103.58 1042.84 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.878e+03  2.928e+02  -6.412 4.22e-10 ***
## cylinders    -1.436e+01  2.038e+01  -0.704  0.48157    
## displacement  1.328e+00  4.738e-01   2.802  0.00534 ** 
## horsepower   -3.587e-01  8.693e-01  -0.413  0.68009    
## weight       -3.522e-01  4.111e-02  -8.567 2.62e-16 ***
## acceleration  9.278e+00  6.232e+00   1.489  0.13740    
## year          4.081e+01  3.214e+00  12.698  < 2e-16 ***
## origin        9.509e+01  1.754e+01   5.422 1.04e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 209.8 on 384 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.7292, Adjusted R-squared:  0.7243 
## F-statistic: 147.8 on 7 and 384 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(log_model)

plot(sqrt_model)

plot(squared_model)

Question 10.

head(Carseats)
##   Sales CompPrice Income Advertising Population Price ShelveLoc Age Education
## 1  9.50       138     73          11        276   120       Bad  42        17
## 2 11.22       111     48          16        260    83      Good  65        10
## 3 10.06       113     35          10        269    80    Medium  59        12
## 4  7.40       117    100           4        466    97    Medium  55        14
## 5  4.15       141     64           3        340   128       Bad  38        13
## 6 10.81       124    113          13        501    72       Bad  78        16
##   Urban  US
## 1   Yes Yes
## 2   Yes Yes
## 3   Yes Yes
## 4   Yes Yes
## 5   Yes  No
## 6    No Yes
summary(Carseats)
##      Sales          CompPrice       Income        Advertising    
##  Min.   : 0.000   Min.   : 77   Min.   : 21.00   Min.   : 0.000  
##  1st Qu.: 5.390   1st Qu.:115   1st Qu.: 42.75   1st Qu.: 0.000  
##  Median : 7.490   Median :125   Median : 69.00   Median : 5.000  
##  Mean   : 7.496   Mean   :125   Mean   : 68.66   Mean   : 6.635  
##  3rd Qu.: 9.320   3rd Qu.:135   3rd Qu.: 91.00   3rd Qu.:12.000  
##  Max.   :16.270   Max.   :175   Max.   :120.00   Max.   :29.000  
##    Population        Price        ShelveLoc        Age          Education   
##  Min.   : 10.0   Min.   : 24.0   Bad   : 96   Min.   :25.00   Min.   :10.0  
##  1st Qu.:139.0   1st Qu.:100.0   Good  : 85   1st Qu.:39.75   1st Qu.:12.0  
##  Median :272.0   Median :117.0   Medium:219   Median :54.50   Median :14.0  
##  Mean   :264.8   Mean   :115.8                Mean   :53.32   Mean   :13.9  
##  3rd Qu.:398.5   3rd Qu.:131.0                3rd Qu.:66.00   3rd Qu.:16.0  
##  Max.   :509.0   Max.   :191.0                Max.   :80.00   Max.   :18.0  
##  Urban       US     
##  No :118   No :142  
##  Yes:282   Yes:258  
##                     
##                     
##                     
## 

Part a) Fit a multiple regression model to predict Sales using Price, Urban, and US.

carseats_model = lm(Sales ~ Price + Urban + US, data = Carseats)
summary(carseats_model)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16
coef(carseats_model)[4]
##    USYes 
## 1.200573

Part b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!

The coefficient for Price is -0.054459 which means for every dollar increase in the price of my car seat, my stores sales decrease by about $54. The coefficient for US = Yes is 1.200573 which means, on average, US stores will sell $1,200 more compared to stores outside the US.

Part c) Write out the model in equation form, being careful to handle the qualitative variables properly.

\(Sales = 13.04 - 0.05Price - 0.022Urban + 1.2US\)

Part d) For which of the predictors can you reject the null hypothesis \(H0 : \beta_j = 0\)?

See part (b) for interpretation, but Price and US = Yes are significant thus we can reject the null hypothesis \(H0 : \beta_j = 0\)

Part e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

carseats_model_2 = lm(Sales ~ Price + US, data = Carseats)
summary(carseats_model_2)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

Part f) How well do the models in (a) and (e) ft the data?

The models did not fit the data well at all as the Adjusted R-squared is 0.2335 for part (a) and 0.2354 for part (e). In other words the models only accounted for 23.35% and 23.54% of the variance respectively.

Part g) Using the model from (e), obtain 95 % confdence intervals for the coeffcient(s).

confint(carseats_model_2)
##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632

Part h) Is there evidence of outliers or high leverage observations in the model from (e)?

Based on the Residuals vs Leverage plot, some points are far from the center of the data (red line) which indicates there might be some influential points in the data. The plot does identify 3 observations with high leverage which are observations # 26, 368, and 50. This is confirmed by influence.measures function showing multiple values are indicators of influential points by the asterisks.

par(mfrow=c(2,2))
plot(carseats_model_2)

summary(influence.measures(carseats_model_2))
## Potentially influential observations of
##   lm(formula = Sales ~ Price + US, data = Carseats) :
## 
##     dfb.1_ dfb.Pric dfb.USYs dffit   cov.r   cook.d hat    
## 26   0.24  -0.18    -0.17     0.28_*  0.97_*  0.03   0.01  
## 29  -0.10   0.10    -0.10    -0.18    0.97_*  0.01   0.01  
## 43  -0.11   0.10     0.03    -0.11    1.05_*  0.00   0.04_*
## 50  -0.10   0.17    -0.17     0.26_*  0.98    0.02   0.01  
## 51  -0.05   0.05    -0.11    -0.18    0.95_*  0.01   0.00  
## 58  -0.05  -0.02     0.16    -0.20    0.97_*  0.01   0.01  
## 69  -0.09   0.10     0.09     0.19    0.96_*  0.01   0.01  
## 126 -0.07   0.06     0.03    -0.07    1.03_*  0.00   0.03_*
## 160  0.00   0.00     0.00     0.01    1.02_*  0.00   0.02  
## 166  0.21  -0.23    -0.04    -0.24    1.02    0.02   0.03_*
## 172  0.06  -0.07     0.02     0.08    1.03_*  0.00   0.02  
## 175  0.14  -0.19     0.09    -0.21    1.03_*  0.02   0.03_*
## 210 -0.14   0.15    -0.10    -0.22    0.97_*  0.02   0.01  
## 270 -0.03   0.05    -0.03     0.06    1.03_*  0.00   0.02  
## 298 -0.06   0.06    -0.09    -0.15    0.97_*  0.01   0.00  
## 314 -0.05   0.04     0.02    -0.05    1.03_*  0.00   0.02_*
## 353 -0.02   0.03     0.09     0.15    0.97_*  0.01   0.00  
## 357  0.02  -0.02     0.02    -0.03    1.03_*  0.00   0.02  
## 368  0.26  -0.23    -0.11     0.27_*  1.01    0.02   0.02_*
## 377  0.14  -0.15     0.12     0.24    0.95_*  0.02   0.01  
## 384  0.00   0.00     0.00     0.00    1.02_*  0.00   0.02  
## 387 -0.03   0.04    -0.03     0.05    1.02_*  0.00   0.02  
## 396 -0.05   0.05     0.08     0.14    0.98_*  0.01   0.00

Question 12.

Part a) Recall that the coeffcient estimate βˆ for the linear regression of Y onto X without an intercept is given by (3.38). Under what circumstance is the coeffcient estimate for the regression of X onto Y the same as the coeffcient estimate for the regression of Y onto X?

When X and Y are perfectly correlated i.e. when the correlation coefficient is equal to 1 or -1. If X and Y have a perfect linear relationship, the regression coefficients for both directions will be identical because the relationship is one-to-one.

Part b) Generate an example in R with n = 100 observations in which the coeffcient estimate for the regression of X onto Y is diferent from the coeffcient estimate for the regression of Y onto X.

set.seed(42)

# Generating a random set of data for X
n = 100
X = rnorm(n)

# Y as a linear function of X 
Y = 7 * X + rnorm(n, mean = 5, sd = 1)

# Fitting regression of X onto Y
model_X_on_Y = lm(X ~ Y)

# Fitting regression of Y onto X
model_Y_on_X = lm(Y ~ X)

# Output coefficients for both regressions
coef(model_Y_on_X)
## (Intercept)           X 
##    4.911633    7.027159
coef(model_X_on_Y)
## (Intercept)           Y 
##  -0.6879615   0.1401672

Part c) Generate an example in R with n = 100 observations in which the coeffcient estimate for the regression of X onto Y is the same as the coeffcient estimate for the regression of Y onto X.

# Set seed for reproducibility
set.seed(42)

# Generating a random set of data for X
n = 100
X = rnorm(n)

# Creating Y as a perfectly correlated variable with X
Y = X

# Fitting regression of X onto Y
model_X_on_Y = lm(X ~ Y)

# Fitting regression of Y onto X
model_Y_on_X = lm(Y ~ X)

coef(model_Y_on_X)
##   (Intercept)             X 
## -1.665335e-17  1.000000e+00
coef(model_X_on_Y)
##   (Intercept)             Y 
## -1.665335e-17  1.000000e+00