#This problem involves the use of simple linear regression on the Auto data set which is in the ISLR package.
#Perform a simple linear regression with mpg as the response and horsepower as the predictor. 
#Use the summary() function to print the results.  Comment on the output.  
library(ISLR)
#loading auto data in the varaible data
data <- Auto
#summary showing Auto
summary(data)
##       mpg          cylinders      displacement     horsepower   
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0  
##                                                                 
##      weight      acceleration        year           origin     
##  Min.   :1613   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
##  1st Qu.:2225   1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000  
##  Median :2804   Median :15.50   Median :76.00   Median :1.000  
##  Mean   :2978   Mean   :15.54   Mean   :75.98   Mean   :1.577  
##  3rd Qu.:3615   3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000  
##  Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000  
##                                                                
##                  name    
##  amc matador       :  5  
##  ford pinto        :  5  
##  toyota corolla    :  5  
##  amc gremlin       :  4  
##  amc hornet        :  4  
##  chevrolet chevette:  4  
##  (Other)           :365
#linear regression model
lrm <- lm(mpg~horsepower, data = data)
#summary showing lr model
summary(lrm)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16
#(a)    Is there a relationship between the predictor and the response?
# Yes, significance is three star so highly related. Pvalue of horsepower is less the 0.05 shows 
#significant.
#(b)    How strong is the relationship between the predictor and the response?
#The R^{2} value indicates that predictor variable hourse power can explain 60% of the variation 
#in the response variable mp.
#(c)    Is the relationship between the predictor and the response positive or negative?
# Negative
#(d)    What is the predicted mpg associated with a horsepower of 98?  What are the associated 95% 
#confidence and prediction intervals?
predict(lrm,data.frame(horsepower=c(98)),interval="prediction")
##        fit     lwr      upr
## 1 24.46708 14.8094 34.12476
predict(lrm,data.frame(horsepower=c(98)),interval="confidence")
##        fit      lwr      upr
## 1 24.46708 23.97308 24.96108
#2.Write out the model in equation form.
#mpg = b0 + b1 * horsepower 

#3.Plot the response and the predictor.
#Display the least squares regression line on the plot.
plot(data$horsepower,data$mpg)
abline(lrm,lwd=5,col="red")

#4.Produce diagnostic plots of the least squares regression fit.  
#Comment on any problems you see with the fit.
which.max(hatvalues(lrm))
## 117 
## 116
par(mfrow = c(2,2))
plot(lrm)

#residual plot showing down curve, linearity assumption is not met.
#QQ plot data looks fairly normal, residuals are normally distributed.
#non linearity showing up in other diagnostic plots