Simple linear regression

View(mtcars)
plot(mtcars$mpg, mtcars$wt)

scatter.smooth(mtcars$mpg, mtcars$wt)

Scatter plot

There is negative relationship between miles per gallon and weight.

Build model

model <- lm(mtcars$mpg ~ mtcars$wt)

summary(model)
## 
## Call:
## lm(formula = mtcars$mpg ~ mtcars$wt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## mtcars$wt    -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Significance of model:

The regression coefficients: 37.2851 (Intercept) and -5.3445 (weight/slope) can be analyzed with the above given data as:

  1. when weight of a car is zero, the mileage of the car is 37.2851 miles per gallon.

  2. If the weight of a car increases by 1 unit, the mileage is decreasing by 5.3445 miles per gallon.

  3. If weight of a car decreases by 1 unit, mileage is increasing by 5.3445 miles per gallon.

The Multiple R-squared value tells how well the ‘x’ explains the ‘y’ variable. In this case, multiple R-squared value of 0.7528 tells that the weight variable in the given data is explaining 75.28% of miles per gallon variable.

Model assumptions:

1. Check for linearity.

plot(model,1)

The plot shows non-linear relationship between predicted values and residuals.

2. Check distribution of residuals.

plot(model,2)

The distribution of residuals shows normal distribution. The plot shows there is about 95% of normal distribution in the plot, but there are also few outliers.

3. Homogeinity of variance/residuals.

plot(model,3)

The plot shows the errors are not constant. There is no consistency. It means my model is predicting different values for the same observation.

Influential observations

plot(model,4)

  1. Influential observation the values that effect the performance when included in the model are 18th, 20th, 17th observation
#This plot shows 5 influential observations in the model.
plot(model,4,id.n=5)

Evaluation of errors

#MAE - mean absolute error
mean(abs(model$residuals))
## [1] 2.340642
#MAPE - Mean absolute percentage error 
mean(abs(model$residuals)/mtcars$wt)
## [1] 0.8046639
#MSE - Mean square error
mean(model$residuals^2)
## [1] 8.697561
#RMSE - root mean square error
sqrt(mean(model$residuals^2))
## [1] 2.949163

Whereas R-squared is a relative measure of fit, RMSE is an absolute measure of fit. As the square root of a variance, RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. Lower values of RMSE indicate better fit.