#This problem involves the use of simple linear regression on the Auto data set which is in the ISLR package.
#Perform a simple linear regression with mpg as the response and horsepower as the predictor.
#Use the summary() function to print the results. Comment on the output.
library(ISLR)
#loading auto data in the varaible data
data <- Auto
#summary showing Auto
summary(data)
## mpg cylinders displacement horsepower
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0
##
## weight acceleration year origin
## Min. :1613 Min. : 8.00 Min. :70.00 Min. :1.000
## 1st Qu.:2225 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000
## Median :2804 Median :15.50 Median :76.00 Median :1.000
## Mean :2978 Mean :15.54 Mean :75.98 Mean :1.577
## 3rd Qu.:3615 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000
## Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000
##
## name
## amc matador : 5
## ford pinto : 5
## toyota corolla : 5
## amc gremlin : 4
## amc hornet : 4
## chevrolet chevette: 4
## (Other) :365
#linear regression model
lrm <- lm(mpg~horsepower, data = data)
#summary showing lr model
summary(lrm)
##
## Call:
## lm(formula = mpg ~ horsepower, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
#(a) Is there a relationship between the predictor and the response?
# Yes, significance is three star so highly related. Pvalue of horsepower is less the 0.05 shows
#significant.
#(b) How strong is the relationship between the predictor and the response?
#The R^{2} value indicates that predictor variable hourse power can explain 60% of the variation
#in the response variable mp.
#(c) Is the relationship between the predictor and the response positive or negative?
# Negative
#(d) What is the predicted mpg associated with a horsepower of 98? What are the associated 95%
#confidence and prediction intervals?
predict(lrm,data.frame(horsepower=c(98)),interval="prediction")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
predict(lrm,data.frame(horsepower=c(98)),interval="confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
#2.Write out the model in equation form.
#mpg = b0 + b1 * horsepower
#3.Plot the response and the predictor.
#Display the least squares regression line on the plot.
plot(data$horsepower,data$mpg)
abline(lrm,lwd=5,col="red")

#4.Produce diagnostic plots of the least squares regression fit.
#Comment on any problems you see with the fit.
which.max(hatvalues(lrm))
## 117
## 116
par(mfrow = c(2,2))
plot(lrm)

#residual plot showing down curve, linearity assumption is not met.
#QQ plot data looks fairly normal, residuals are normally distributed.
#non linearity showing up in other diagnostic plots