—————————————————————————

Student Name : Sachid Deshmukh

—————————————————————————

In this example we will use built in cars dataset to build multiple regression model

Aim is to predict car mileage using car displacement, horsepower, drat and weight

1] Data Analysis

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
attach(mtcars)

2] Build Model

model <- mpg~disp + hp + drat + wt
fit <- lm(model, mtcars)
fit
## 
## Call:
## lm(formula = model, data = mtcars)
## 
## Coefficients:
## (Intercept)         disp           hp         drat           wt  
##   29.148738     0.003815    -0.034784     1.768049    -3.479668

3] Model Summary.

summary(fit)
## 
## Call:
## lm(formula = model, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5077 -1.9052 -0.5057  0.9821  5.6883 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.148738   6.293588   4.631  8.2e-05 ***
## disp         0.003815   0.010805   0.353  0.72675    
## hp          -0.034784   0.011597  -2.999  0.00576 ** 
## drat         1.768049   1.319779   1.340  0.19153    
## wt          -3.479668   1.078371  -3.227  0.00327 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.602 on 27 degrees of freedom
## Multiple R-squared:  0.8376, Adjusted R-squared:  0.8136 
## F-statistic: 34.82 on 4 and 27 DF,  p-value: 2.704e-10

From the model summary we can conclude that

1. Horsepower and Weight are the strong predictor of mileage

2. Displacement and drat are not helping in predicting car mileage

3. Negative coefficient for horsepower and weight indicates that mileage increases as weight and horsepower decreases and vice versa

Residual Analysis

plot(fit$residuals)

Residuals are y axis imbalance indicating some outliers in the data

Residuals show some non-linear pattern indicating relationship of mileage and predictor variables is not strictly linear

qqnorm(fit$residuals)

Residuals are not normally distributed indicating weight and horsepower are not the only predictor for mileage. We need to add some more predictor for good fit of the model