#Single Linear Regression - Cars
#Visualization
lm_car <- lm(dist~speed, data=cars)
plot(cars$speed, cars$dist)
abline(lm_car)

#Quality Evaluation of the model - Model diagnostics
summary(lm_car)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
#We know p-value of speed is less than 0.05 so we reject null hypothesis that there is no difference between the means.
#As b_0 (speed) is statistically significant, we can say that there is a marginal impact of speed on dist.
#Adjusted R-squared is 0.6438 and we cannot really say it is very high, not like 0.9, but it is not significantly low either.
plot(lm_car)




#From the diagnostic plot, we can say that most of residuals are centered around 0 and normal QQ values are fitting the theoretical line fairly well despite there are some outliers that deviate from the mean by pretty huge margin.
#Both QQ plot and residual vs. fitted value graph tell us that this model is fairly noramlly distributed but not almost normally distributed.
#Residual analysis - Histogram and Summary
hist(resid(lm_car))

summary(resid(lm_car))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -29.069 -9.525 -2.272 0.000 9.215 43.201
#some of residuals are centered around 0 but since mean > median, we can say that the model is positvely skewed.