auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data",
header=TRUE,
na.strings = "?")
auto=na.omit(auto)
attach(auto)
mod <- lm(mpg ~ horsepower)
summary(mod)
##
## Call:
## lm(formula = mpg ~ horsepower)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Yes.
The relationship is strong, as indicated by the R^2 value of .6059. About 60% of the variance in mpg can be accounted for by knowledge of horsepower.
It is negative, as indicated by the negative sign on the estimate for horsepower. For every additional unit of horsepower, there is a .157 decrease in units of mpg.
newdata<-data.frame(horsepower=c(98))
predict(mod, newdata, interval="confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
predict(mod, newdata, interval="predict")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
The predicted mpg with a horsepower of 98 would be 24.47. Prediction and confidence intervals are shown above.
plot(mpg ~ horsepower)
abline(mod)
plot(mpg, mod$residuals)
abline(h=0)
qqnorm(mod$residuals)
qqline(mod$residuals)
hist(mod$residuals)
The plots indicate a right skew of the residuals. Additionally, the error is uneven given the level of mpg (not constant variance). The error is largely negative at low mpg and largely positive at high mpg.
auto <- auto[, -c(8:9)]
pairs(auto)
cor(auto)
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## acceleration year
## mpg 0.4233285 0.5805410
## cylinders -0.5046834 -0.3456474
## displacement -0.5438005 -0.3698552
## horsepower -0.6891955 -0.4163615
## weight -0.4168392 -0.3091199
## acceleration 1.0000000 0.2903161
## year 0.2903161 1.0000000
mlr_mod <- lm(mpg ~ cylinders + displacement + horsepower + weight + acceleration + year)
summary(mlr_mod)
##
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight +
## acceleration + year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.6927 -2.3864 -0.0801 2.0291 14.3607
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.454e+01 4.764e+00 -3.051 0.00244 **
## cylinders -3.299e-01 3.321e-01 -0.993 0.32122
## displacement 7.678e-03 7.358e-03 1.044 0.29733
## horsepower -3.914e-04 1.384e-02 -0.028 0.97745
## weight -6.795e-03 6.700e-04 -10.141 < 2e-16 ***
## acceleration 8.527e-02 1.020e-01 0.836 0.40383
## year 7.534e-01 5.262e-02 14.318 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.435 on 385 degrees of freedom
## Multiple R-squared: 0.8093, Adjusted R-squared: 0.8063
## F-statistic: 272.2 on 6 and 385 DF, p-value: < 2.2e-16
Y <- as.matrix(mpg)
n<-dim(Y)[1]
X <- matrix(c(rep(1, n),
cylinders,
displacement,
horsepower,
weight,
acceleration,
year),
ncol = 7,
byrow = FALSE)
betaHat<-solve(t(X)%*%X)%*%t(X)%*%Y
betaHat
## [,1]
## [1,] -1.453525e+01
## [2,] -3.298591e-01
## [3,] 7.678430e-03
## [4,] -3.913556e-04
## [5,] -6.794618e-03
## [6,] 8.527325e-02
## [7,] 7.533672e-01
plot(mpg, mlr_mod$residuals)
abline(h=0)
qqnorm(mlr_mod$residuals)
qqline(mlr_mod$residuals)
hist(mlr_mod$residuals)
The fit has similar problems as the simple linear regression fit. It has a right skew and relatively curved shape (variance is not constant). There are some large outliers for residuals when true mpg is very high.