Hannah Hon
auto = read.table('https://www-bcf.usc.edu/~gareth/ISL/Auto.data',header = T, na.strings = '?')
auto$origin = factor(auto$origin, 1:3, c('US','Europe','Japan'))
head(auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 US
## 2 15 8 350 165 3693 11.5 70 US
## 3 18 8 318 150 3436 11.0 70 US
## 4 16 8 304 150 3433 12.0 70 US
## 5 17 8 302 140 3449 10.5 70 US
## 6 15 8 429 198 4341 10.0 70 US
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
fit = lm(mpg ~ horsepower, auto)
summary(fit)
##
## Call:
## lm(formula = mpg ~ horsepower, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
plot(fit)



abline(fit)

## a.mpg = 39.935861 - 0.157845horsepower
## b. The slope shows that as horsepower decrease by 1 unit, mpg decrease by 0.158 unit on average.
## c. The standard error of the slope is 0.717.
## d. The residual standard error shows that the true horsepower is about 4.906 away from the predicted horsepower.
## e. There is a significant relationship between horsepower and mpg, because the p-value is less than 0.05 and F test is significant.
## f. 60.59% variation in mpg can be explained by horsepower.
predict(fit, data.frame(horsepower = 98))
## 1
## 24.46708
predict(fit, data.frame(horsepower = 98), interval="prediction", level = 0.95)
## fit lwr upr
## 1 24.46708 14.8094 34.12476
predict(fit, data.frame(horsepower = 98), interval="confidence", level = 0.99)
## fit lwr upr
## 1 24.46708 23.81669 25.11747
confint(fit, 'horsepower', level = 0.90)
## 5 % 95 %
## horsepower -0.1684719 -0.1472176
## g.39.936 - 0.158 * 98 = 24.47
## h. the 95% prediction interval is (14.8094, 34.12476)
## i. the 99% confidence interval is (23.82, 25.12)
## j. the 90% confidence interval for the slope is (-0.1684719, -0.1472176)
## k. There is non-linearity in the data, but still homoscedastic.
Question 2
plot(auto, pch = '.')

round(cor(auto[,1:7], use = 'pair'),4)
## mpg cylinders displacement horsepower weight
## mpg 1.0000 -0.7763 -0.8044 -0.7784 -0.8317
## cylinders -0.7763 1.0000 0.9509 0.8430 0.8970
## displacement -0.8044 0.9509 1.0000 0.8973 0.9331
## horsepower -0.7784 0.8430 0.8973 1.0000 0.8645
## weight -0.8317 0.8970 0.9331 0.8645 1.0000
## acceleration 0.4223 -0.5041 -0.5442 -0.6892 -0.4195
## year 0.5815 -0.3467 -0.3698 -0.4164 -0.3079
## acceleration year
## mpg 0.4223 0.5815
## cylinders -0.5041 -0.3467
## displacement -0.5442 -0.3698
## horsepower -0.6892 -0.4164
## weight -0.4195 -0.3079
## acceleration 1.0000 0.2829
## year 0.2829 1.0000
fit = lm(mpg ~ ., auto[,1:8])
plot(fit)




summary(fit)
##
## Call:
## lm(formula = mpg ~ ., data = auto[, 1:8])
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0095 -2.0785 -0.0982 1.9856 13.3608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.795e+01 4.677e+00 -3.839 0.000145 ***
## cylinders -4.897e-01 3.212e-01 -1.524 0.128215
## displacement 2.398e-02 7.653e-03 3.133 0.001863 **
## horsepower -1.818e-02 1.371e-02 -1.326 0.185488
## weight -6.710e-03 6.551e-04 -10.243 < 2e-16 ***
## acceleration 7.910e-02 9.822e-02 0.805 0.421101
## year 7.770e-01 5.178e-02 15.005 < 2e-16 ***
## originEurope 2.630e+00 5.664e-01 4.643 4.72e-06 ***
## originJapan 2.853e+00 5.527e-01 5.162 3.93e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.307 on 383 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.8242, Adjusted R-squared: 0.8205
## F-statistic: 224.5 on 8 and 383 DF, p-value: < 2.2e-16
## a.The correlation between mpg and displacement is -0.8044, there is a strong negative relationship between displacement and mpg.
## b. They are highy and negatively correlated, which means that increase in displacement will produce decrease in mpg.
## c. Yes, because the p-value of the F-statistic is less than 2.2e-16.
## d. Displacement, weight, year, originEurope, originJapan have significant
## e. It suggests that a unit increase in year associates with 7.982e-01 increase in mpg on average, holding other predictors constant, on average.
## f. The slope coefficient for displacement means that as displacement increase by 1 unit, mpg increase by 0.02398 on average, holding other variables constant.
Problem 3
fit2 = lm(mpg ~ cylinders + displacement+ weight+ year, auto)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ cylinders + displacement + weight + year,
## data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.995 -2.270 -0.165 2.053 14.368
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.076941 4.055159 -3.471 0.000575 ***
## cylinders -0.289589 0.329225 -0.880 0.379611
## displacement 0.004973 0.006701 0.742 0.458425
## weight -0.006702 0.000572 -11.717 < 2e-16 ***
## year 0.764751 0.050684 15.089 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.436 on 392 degrees of freedom
## Multiple R-squared: 0.8091, Adjusted R-squared: 0.8072
## F-statistic: 415.5 on 4 and 392 DF, p-value: < 2.2e-16
library(car)
## Loading required package: carData
vif(fit2)
## cylinders displacement weight year
## 10.524432 16.406259 7.888061 1.173000
fit3 = lm(mpg ~ cylinders + displacement+ year, auto)
summary(fit3)
##
## Call:
## lm(formula = mpg ~ cylinders + displacement + year, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.0801 -2.6445 -0.2925 2.1004 14.9103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18.199719 4.688296 -3.882 0.000122 ***
## cylinders -0.620910 0.380657 -1.631 0.103658
## displacement -0.041545 0.006265 -6.632 1.1e-10 ***
## year 0.699324 0.058461 11.962 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.988 on 393 degrees of freedom
## Multiple R-squared: 0.7423, Adjusted R-squared: 0.7403
## F-statistic: 377.3 on 3 and 393 DF, p-value: < 2.2e-16
fit4 = lm(mpg ~ cylinders + year, auto)
summary(fit4)
##
## Call:
## lm(formula = mpg ~ cylinders + year, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6462 -2.8847 -0.1399 2.5095 15.6875
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.30285 4.93534 -3.506 0.000507 ***
## cylinders -3.00405 0.13223 -22.718 < 2e-16 ***
## year 0.75289 0.06098 12.347 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.2 on 394 degrees of freedom
## Multiple R-squared: 0.7135, Adjusted R-squared: 0.712
## F-statistic: 490.5 on 2 and 394 DF, p-value: < 2.2e-16
## a. The signs of weight and year show that they have p-value less than 0.001, indicating that the two estimated coefficients are significantly different from 0. The signs of cylinders and displacement show that they have p-value between 0.1 and 1, indicating that the two estimated coeficients are not significantly different from 0. The value of R2 is 0.8091.which means that 80.91% of variation in mpg can be explained by cylinders, displacement, weight and year combined.
## b. The vif for cylinders and displacement are both larger than 10, which means that
## there is a high multicollinearity and the variation will seem larger and the
## factor will appear to be more influential than it is.
## c. The estimated coefficients of displacement became significantly different from 0, and the absolute value of the estimated coefficients of displacement increased, but that of year decreased. The r-square is 0.7403, which means that 74.03% of variation mpg can be explained by cylinders, displacement and year together.
## d.The estimated coefficients of cylinders became significantly different from 0, and the absolute value of the estimated coefficients of cylinders also increased.The r-square is 0.7135, which means that 71.35% of variation mpg can be explained by cylinders and year together.