#Chapter 3 exercises

##Numba 2) What is the difference b/w KNN regression and KNN classifier? KNN regressior first identify points closest to a predictor in the data and take the average data, however many you decide to change the value of k, it will find however many points and average them out. KNN classifier identifies the x and find the closest x in the data and find the corresponding y, and it will make a prediction based on the number of neighbors instead of a average of the neighbors.

#Numba 9)

9a)

plot(Auto)

9b)

as.numeric(Auto$horsepower)
##   [1] 130 165 150 150 140 198 220 215 225 190 170 160 150 225  95  95  97  85
##  [19]  88  46  87  90  95 113  90 215 200 210 193  88  90  95 100 105 100  88
##  [37] 100 165 175 153 150 180 170 175 110  72 100  88  86  90  70  76  65  69
##  [55]  60  70  95  80  54  90  86 165 175 150 153 150 208 155 160 190  97 150
##  [73] 130 140 150 112  76  87  69  86  92  97  80  88 175 150 145 137 150 198
##  [91] 150 158 150 215 225 175 105 100 100  88  95  46 150 167 170 180 100  88
## [109]  72  94  90  85 107  90 145 230  49  75  91 112 150 110 122 180  95 100
## [127] 100  67  80  65  75 100 110 105 140 150 150 140 150  83  67  78  52  61
## [145]  75  75  75  97  93  67  95 105  72  72 170 145 150 148 110 105 110  95
## [163] 110 110 129  75  83 100  78  96  71  97  97  70  90  95  88  98 115  53
## [181]  86  81  92  79  83 140 150 120 152 100 105  81  90  52  60  70  53 100
## [199]  78 110  95  71  70  75  72 102 150  88 108 120 180 145 130 150  68  80
## [217]  58  96  70 145 110 145 130 110 105 100  98 180 170 190 149  78  88  75
## [235]  89  63  83  67  78  97 110 110  48  66  52  70  60 110 140 139 105  95
## [253]  85  88 100  90 105  85 110 120 145 165 139 140  68  95  97  75  95 105
## [271]  85  97 103 125 115 133  71  68 115  85  88  90 110 130 129 138 135 155
## [289] 142 125 150  71  65  80  80  77 125  71  90  70  70  65  69  90 115 115
## [307]  90  76  60  70  65  90  88  90  90  78  90  75  92  75  65 105  65  48
## [325]  48  67  67  67  67  62 132 100  88  72  84  84  92 110  84  58  64  60
## [343]  67  65  62  68  63  65  65  74  75  75 100  74  80  76 116 120 110 105
## [361]  88  85  88  88  88  85  84  90  92  74  68  68  63  70  88  75  70  67
## [379]  67  67 110  85  92 112  96  84  90  86  52  84  79  82
cor(Auto[sapply(Auto, is.numeric)])
##                     mpg  cylinders displacement horsepower     weight
## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
## origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054
##              acceleration       year     origin
## mpg             0.4233285  0.5805410  0.5652088
## cylinders      -0.5046834 -0.3456474 -0.5689316
## displacement   -0.5438005 -0.3698552 -0.6145351
## horsepower     -0.6891955 -0.4163615 -0.4551715
## weight         -0.4168392 -0.3091199 -0.5850054
## acceleration    1.0000000  0.2903161  0.2127458
## year            0.2903161  1.0000000  0.1815277
## origin          0.2127458  0.1815277  1.0000000

9c)

lm.fit=lm(mpg~.-name, data = Auto)
summary(lm.fit)
## 
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

Yes, displacement, weight, and year have a significant relationship to mpg As year increase, so the newer the car, mpg increase as well by .75, there is a positive relationship.

9d)

par(mfrow=c(2,2))
plot(lm.fit)

The residual plots suggest of high outliers on the tail end. The leverage plot show unusually high leverage, observation 14 a change to that would change the data significantly.

e)

f)

#Numba 10)

10a)Fit a multiple regression model to predict Sales using Price, Urban, and US.

lm.car = lm(Sales~Price+Urban+US, data = Carseats)
plot(lm.car)

summary(lm.car)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

10b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!

With each unit of price increase, sales decrease by .05, so like 50 less car
seats

10c)Write out the model in equation form, being careful to handle the qualitative variables properly.

\(Sales = 13.043469 - 0.054459Price - 0.021916Urban_{Yes} + 1.200573US_{Yes}\)

10d)For which of the predictors can you reject the null hypothesis \(H_0 : \beta_j = 0\)?

Price and US

10e)On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

lm.car = lm(Sales~Price+US, data = Carseats)
summary(lm.car)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

10f)How well do the models in (a) and (e) fit the data?

Not well, each model explains around 23% of the variance in sales.

10g)Using the model from (e), obtain 95 % confidence intervals for the coefficient(s).

confint(lm.car)
##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632

10h)Is there evidence of outliers or high leverage observations in the model from (e)?

#12)

#The End so far…