#Chapter 3 exercises
##Numba 2) What is the difference b/w KNN regression and KNN classifier? KNN regressior first identify points closest to a predictor in the data and take the average data, however many you decide to change the value of k, it will find however many points and average them out. KNN classifier identifies the x and find the closest x in the data and find the corresponding y, and it will make a prediction based on the number of neighbors instead of a average of the neighbors.
#Numba 9)
9a)
plot(Auto)
9b)
as.numeric(Auto$horsepower)
## [1] 130 165 150 150 140 198 220 215 225 190 170 160 150 225 95 95 97 85
## [19] 88 46 87 90 95 113 90 215 200 210 193 88 90 95 100 105 100 88
## [37] 100 165 175 153 150 180 170 175 110 72 100 88 86 90 70 76 65 69
## [55] 60 70 95 80 54 90 86 165 175 150 153 150 208 155 160 190 97 150
## [73] 130 140 150 112 76 87 69 86 92 97 80 88 175 150 145 137 150 198
## [91] 150 158 150 215 225 175 105 100 100 88 95 46 150 167 170 180 100 88
## [109] 72 94 90 85 107 90 145 230 49 75 91 112 150 110 122 180 95 100
## [127] 100 67 80 65 75 100 110 105 140 150 150 140 150 83 67 78 52 61
## [145] 75 75 75 97 93 67 95 105 72 72 170 145 150 148 110 105 110 95
## [163] 110 110 129 75 83 100 78 96 71 97 97 70 90 95 88 98 115 53
## [181] 86 81 92 79 83 140 150 120 152 100 105 81 90 52 60 70 53 100
## [199] 78 110 95 71 70 75 72 102 150 88 108 120 180 145 130 150 68 80
## [217] 58 96 70 145 110 145 130 110 105 100 98 180 170 190 149 78 88 75
## [235] 89 63 83 67 78 97 110 110 48 66 52 70 60 110 140 139 105 95
## [253] 85 88 100 90 105 85 110 120 145 165 139 140 68 95 97 75 95 105
## [271] 85 97 103 125 115 133 71 68 115 85 88 90 110 130 129 138 135 155
## [289] 142 125 150 71 65 80 80 77 125 71 90 70 70 65 69 90 115 115
## [307] 90 76 60 70 65 90 88 90 90 78 90 75 92 75 65 105 65 48
## [325] 48 67 67 67 67 62 132 100 88 72 84 84 92 110 84 58 64 60
## [343] 67 65 62 68 63 65 65 74 75 75 100 74 80 76 116 120 110 105
## [361] 88 85 88 88 88 85 84 90 92 74 68 68 63 70 88 75 70 67
## [379] 67 67 110 85 92 112 96 84 90 86 52 84 79 82
cor(Auto[sapply(Auto, is.numeric)])
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
9c)
lm.fit=lm(mpg~.-name, data = Auto)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
Yes, displacement, weight, and year have a significant relationship to mpg As year increase, so the newer the car, mpg increase as well by .75, there is a positive relationship.
9d)
par(mfrow=c(2,2))
plot(lm.fit)
The residual plots suggest of high outliers on the tail end. The leverage plot show unusually high leverage, observation 14 a change to that would change the data significantly.
e)
f)
#Numba 10)
10a)Fit a multiple regression model to predict Sales using Price, Urban, and US.
lm.car = lm(Sales~Price+Urban+US, data = Carseats)
plot(lm.car)
summary(lm.car)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
10b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!
With each unit of price increase, sales decrease by .05, so like 50 less car
seats
10c)Write out the model in equation form, being careful to handle the qualitative variables properly.
\(Sales = 13.043469 - 0.054459Price - 0.021916Urban_{Yes} + 1.200573US_{Yes}\)
10d)For which of the predictors can you reject the null hypothesis \(H_0 : \beta_j = 0\)?
Price and US
10e)On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.
lm.car = lm(Sales~Price+US, data = Carseats)
summary(lm.car)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
10f)How well do the models in (a) and (e) fit the data?
Not well, each model explains around 23% of the variance in sales.
10g)Using the model from (e), obtain 95 % confidence intervals for the coefficient(s).
confint(lm.car)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## Price -0.06475984 -0.04419543
## USYes 0.69151957 1.70776632
10h)Is there evidence of outliers or high leverage observations in the model from (e)?
#12)
#The End so far…