Problem.8 from Chapter-3 in ISLR

Question-a

Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results.

library("ISLR")

lm.fit <- lm(mpg ~ horsepower, data = Auto)
summary(lm.fit)

## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

i. Is there a relationship between the predictor and the response?

The p-values for the regression coefficients are nearly zero. This implies statistical significance, which in turn mean that there is a relationship.

ii. How strong is the relationship between the predictor and the response?

The R^{2} value indicates that about 61% of the variation in the response variable ( mpg) is due to the predictor variable (horsepower).

iii. Is the relationship between the predictor and the response positive or negative? The regression coefficient for ‘horsepower’ is negative. Hence, the relationship is negative.

iv. What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?

The confidence 95% interval

predict(lm.fit, data.frame(horsepower = c(85)), interval ="confidence")

##        fit    lwr      upr
## 1 26.51906 25.973 27.06512

And, the 95% prediction interval

predict(lm.fit, data.frame(horsepower = c(85)), interval ="prediction")

##        fit      lwr      upr
## 1 26.51906 16.85857 36.17954

As expected the prediction interval is wider than the confidence interval.

Question-b

Plot the response and the predictor. Use the abline() function to display the least squares regression line.

attach(Auto)
plot(mpg~horsepower, main =" MPG vs Horsepower", xlab = " Horsepower", ylab ="MPG")
abline(coef = coef(lm.fit), col ="red")

detach(Auto)

Question-c

Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.

 par(mfrow=c(2,2))
 plot(lm.fit)

The first plot shows a pattern (U-shaped) between the residuals and the fitted values. This indicates a non-linear relationship between the predictor and response variables. The second plot shows that the residuals are normally distributed. The third plot shows that the variance of the errors is constant. Finally, the fourth plot indicates that there are no leverage points in the data.

Ahmed TADDE

Problem.8 from Chapter-3 in ISLR

Ahmed Tadde

May 8, 2015

Question-a

Question-b

Question-c