First, we will load the dataset and check the structure of it.

data(mtcars)

# check the structure of the dataset
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Next, we will build a linear regression model predict mpg (miles per gallon) based on other variables in the dataset. Field descriptions below:

model <- lm(mpg ~ cyl + disp + hp + wt + qsec, data = mtcars)

# summary of the model
summary(model)
## 
## Call:
## lm(formula = mpg ~ cyl + disp + hp + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3117 -1.3483 -0.4352  1.2603  5.6094 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 35.87361    9.91809   3.617  0.00126 **
## cyl         -1.15608    0.71525  -1.616  0.11809   
## disp         0.01195    0.01191   1.004  0.32484   
## hp          -0.01584    0.01527  -1.037  0.30908   
## wt          -4.22527    1.25239  -3.374  0.00233 **
## qsec         0.25382    0.48746   0.521  0.60699   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.547 on 26 degrees of freedom
## Multiple R-squared:  0.8502, Adjusted R-squared:  0.8214 
## F-statistic: 29.51 on 5 and 26 DF,  p-value: 6.182e-10

Now, we will conduct residual analysis.

# histogram
hist(resid(model))

# residuals vs. each predictor variable
par(mfrow=c(2,3))
plot(model)

Residual analysis is done to assess whether the assumptions of linear regression are met:

This model passes our residual analysis so the model was appropriate.