In this example we will use built in cars dataset to build multiple regression model
Aim is to predict car mileage using car displacement, horsepower, drat and weight
1] Data Analysis
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
attach(mtcars)
2] Build Model
model <- mpg~disp + hp + drat + wt
fit <- lm(model, mtcars)
fit
##
## Call:
## lm(formula = model, data = mtcars)
##
## Coefficients:
## (Intercept) disp hp drat wt
## 29.148738 0.003815 -0.034784 1.768049 -3.479668
3] Model Summary.
summary(fit)
##
## Call:
## lm(formula = model, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5077 -1.9052 -0.5057 0.9821 5.6883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.148738 6.293588 4.631 8.2e-05 ***
## disp 0.003815 0.010805 0.353 0.72675
## hp -0.034784 0.011597 -2.999 0.00576 **
## drat 1.768049 1.319779 1.340 0.19153
## wt -3.479668 1.078371 -3.227 0.00327 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.602 on 27 degrees of freedom
## Multiple R-squared: 0.8376, Adjusted R-squared: 0.8136
## F-statistic: 34.82 on 4 and 27 DF, p-value: 2.704e-10
From the model summary we can conclude that
1. Horsepower and Weight are the strong predictor of mileage
2. Displacement and drat are not helping in predicting car mileage
3. Negative coefficient for horsepower and weight indicates that mileage increases as weight and horsepower decreases and vice versa
Residual Analysis
plot(fit$residuals)
Residuals are y axis imbalance indicating some outliers in the data
Residuals show some non-linear pattern indicating relationship of mileage and predictor variables is not strictly linear
qqnorm(fit$residuals)
Residuals are not normally distributed indicating weight and horsepower are not the only predictor for mileage. We need to add some more predictor for good fit of the model