Question: This question involves the use of simple linear regression on the Auto data set. (a) Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results.
library(ISLR)
library(MASS)
data("Auto")
head(Auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
lm.fit<-lm(mpg~horsepower,data=Auto)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Comment on the output. i. Is there a relationship between the predictor and the response? Yes, since p value is 2.2e-16
How strong is the relationship between the predictor and the response? The R^{2} value indicates that about 61% of the variation in the response variable ( mpg) is due to the predictor variable (horsepower).
Is the relationship between the predictor and the response positive or negative? Negative
What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?
predict(lm.fit,data.frame(horsepower=c(98)),interval="prediction")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
predict(lm.fit,data.frame(horsepower=c(98)),interval="confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
attach(Auto)
plot(horsepower,mpg)
abline(lm.fit,lwd=5,col="blue")
which.max(hatvalues(lm.fit))
## 117
## 116
par(mfrow = c(2,2))
plot(lm.fit)
pairs(Auto)
Auto$name<-NULL
cor(Auto,method = c("pearson"))
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
lm.fit<-lm(mpg~.,data=Auto)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ ., data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
Comment on the output. For instance: i. Is there a relationship between the predictors and the response? Yes. ii. Which predictors appear to have a statistically significant relationship to the response? displacement, weight, year, origin .
What does the coefficient for the year variable suggest? When every other predictor held constant, the mpg value increase
What does the coefficient for the year variable suggest? When every other predictor held constant, the mpg value increases with each year that passes.
which.max(hatvalues(lm.fit))
## 14
## 14
par(mfrow = c(2,2))
plot(lm.fit)
Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage? The first graph shows that there is a non-linear relationship between the responce and the predictors; The second graph shows that the residuals are normally distributed and right skewed; The third graph shows that the constant variance of error assumption is not true for this model; The fourth graphs shows that there are no leverage points. However, there on observation that stands out as a potential leverage point (labeled 14 on the graph)
lm.fit = lm(mpg ~.-name+displacement:weight, data = Auto)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ . - name + displacement:weight, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9027 -1.8092 -0.0946 1.5549 12.1687
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.389e+00 4.301e+00 -1.253 0.2109
## cylinders 1.175e-01 2.943e-01 0.399 0.6899
## displacement -6.837e-02 1.104e-02 -6.193 1.52e-09 ***
## horsepower -3.280e-02 1.238e-02 -2.649 0.0084 **
## weight -1.064e-02 7.136e-04 -14.915 < 2e-16 ***
## acceleration 6.724e-02 8.805e-02 0.764 0.4455
## year 7.852e-01 4.553e-02 17.246 < 2e-16 ***
## origin 5.610e-01 2.622e-01 2.139 0.0331 *
## displacement:weight 2.269e-05 2.257e-06 10.054 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.964 on 383 degrees of freedom
## Multiple R-squared: 0.8588, Adjusted R-squared: 0.8558
## F-statistic: 291.1 on 8 and 383 DF, p-value: < 2.2e-16
lm.fit = lm(mpg ~.-name+I((displacement)^2)+log(displacement)+displacement:weight, data = Auto)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ . - name + I((displacement)^2) + log(displacement) +
## displacement:weight, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7453 -1.8071 0.0077 1.5523 12.2398
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.372e+01 2.127e+01 -2.056 0.040508 *
## cylinders 6.809e-01 3.756e-01 1.813 0.070618 .
## displacement -1.965e-01 6.336e-02 -3.101 0.002073 **
## horsepower -4.658e-02 1.390e-02 -3.351 0.000886 ***
## weight -9.389e-03 1.415e-03 -6.633 1.13e-10 ***
## acceleration 4.618e-02 8.993e-02 0.514 0.607885
## year 7.673e-01 4.596e-02 16.696 < 2e-16 ***
## origin 5.165e-01 2.713e-01 1.904 0.057702 .
## I((displacement)^2) 1.737e-04 7.263e-05 2.391 0.017291 *
## log(displacement) 1.046e+01 5.796e+00 1.805 0.071801 .
## displacement:weight 1.889e-05 4.645e-06 4.067 5.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.949 on 381 degrees of freedom
## Multiple R-squared: 0.8609, Adjusted R-squared: 0.8572
## F-statistic: 235.7 on 10 and 381 DF, p-value: < 2.2e-16