BACKWARD STEPWISE REGRESSION is a stepwise regression approach that begins with a full (saturated) model and at each step gradually eliminates variables from the regression model to find a reduced model that best explains the data. Also known as Backward Elimination regression.
The stepwise approach is useful because it reduces the number of predictors, reducing the multicollinearity problem and it is one of the ways to resolve the overfitting.
For this blog I will be using the train set from assignment 3.
train <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/crime-training-data_modified.csv")
The MASS package has a function stepAIC() that can be used for backward elimination.
back_simple <- lm(target ~ 1, data=train)
back_full <- lm(target ~ ., data=train)
stepAIC(back_full, scope=list(lower=back_simple, upper=back_full), data=train, direction='backward')
## Start: AIC=-1072.6
## target ~ zn + indus + chas + nox + rm + age + dis + rad + tax +
## ptratio + lstat + medv
##
## Df Sum of Sq RSS AIC
## - chas 1 0.0010 44.110 -1074.6
## - indus 1 0.0517 44.161 -1074.0
## - rm 1 0.0612 44.171 -1074.0
## - dis 1 0.0765 44.186 -1073.8
## - zn 1 0.1021 44.211 -1073.5
## - tax 1 0.1105 44.220 -1073.4
## - lstat 1 0.1309 44.240 -1073.2
## - ptratio 1 0.1482 44.258 -1073.0
## <none> 44.109 -1072.6
## - medv 1 0.8622 44.972 -1065.6
## - age 1 1.1988 45.308 -1062.1
## - rad 1 2.2167 46.326 -1051.8
## - nox 1 5.4647 49.574 -1020.2
##
## Step: AIC=-1074.59
## target ~ zn + indus + nox + rm + age + dis + rad + tax + ptratio +
## lstat + medv
##
## Df Sum of Sq RSS AIC
## - indus 1 0.0540 44.164 -1076.0
## - rm 1 0.0609 44.171 -1076.0
## - dis 1 0.0771 44.188 -1075.8
## - zn 1 0.1022 44.213 -1075.5
## - tax 1 0.1143 44.225 -1075.4
## - lstat 1 0.1308 44.241 -1075.2
## - ptratio 1 0.1474 44.258 -1075.0
## <none> 44.110 -1074.6
## - medv 1 0.8851 44.995 -1067.3
## - age 1 1.2031 45.313 -1064.0
## - rad 1 2.2364 46.347 -1053.5
## - nox 1 5.5026 49.613 -1021.8
##
## Step: AIC=-1076.02
## target ~ zn + nox + rm + age + dis + rad + tax + ptratio + lstat +
## medv
##
## Df Sum of Sq RSS AIC
## - rm 1 0.0508 44.215 -1077.5
## - dis 1 0.0571 44.221 -1077.4
## - tax 1 0.0681 44.232 -1077.3
## - zn 1 0.1246 44.289 -1076.7
## - lstat 1 0.1384 44.303 -1076.6
## - ptratio 1 0.1745 44.339 -1076.2
## <none> 44.164 -1076.0
## - medv 1 0.8980 45.062 -1068.6
## - age 1 1.1987 45.363 -1065.5
## - rad 1 2.2339 46.398 -1055.0
## - nox 1 6.3555 50.520 -1015.4
##
## Step: AIC=-1077.49
## target ~ zn + nox + age + dis + rad + tax + ptratio + lstat +
## medv
##
## Df Sum of Sq RSS AIC
## - dis 1 0.0574 44.272 -1078.9
## - tax 1 0.0742 44.289 -1078.7
## - lstat 1 0.0990 44.314 -1078.4
## - zn 1 0.1094 44.325 -1078.3
## - ptratio 1 0.1651 44.380 -1077.8
## <none> 44.215 -1077.5
## - medv 1 1.2223 45.437 -1066.8
## - age 1 1.3556 45.571 -1065.4
## - rad 1 2.3831 46.598 -1055.0
## - nox 1 6.3272 50.542 -1017.2
##
## Step: AIC=-1078.88
## target ~ zn + nox + age + rad + tax + ptratio + lstat + medv
##
## Df Sum of Sq RSS AIC
## - zn 1 0.0631 44.336 -1080.2
## - lstat 1 0.0806 44.353 -1080.0
## - tax 1 0.0923 44.365 -1079.9
## - ptratio 1 0.1570 44.429 -1079.2
## <none> 44.272 -1078.9
## - medv 1 1.1833 45.456 -1068.6
## - age 1 1.3078 45.580 -1067.3
## - rad 1 2.4310 46.704 -1056.0
## - nox 1 6.9646 51.237 -1012.8
##
## Step: AIC=-1080.22
## target ~ nox + age + rad + tax + ptratio + lstat + medv
##
## Df Sum of Sq RSS AIC
## - lstat 1 0.0800 44.416 -1081.4
## - tax 1 0.1266 44.462 -1080.9
## <none> 44.336 -1080.2
## - ptratio 1 0.2574 44.593 -1079.5
## - medv 1 1.1597 45.495 -1070.2
## - age 1 1.5927 45.928 -1065.8
## - rad 1 2.4722 46.808 -1056.9
## - nox 1 7.9023 52.238 -1005.8
##
## Step: AIC=-1081.38
## target ~ nox + age + rad + tax + ptratio + medv
##
## Df Sum of Sq RSS AIC
## - tax 1 0.1256 44.541 -1082.1
## <none> 44.416 -1081.4
## - ptratio 1 0.2325 44.648 -1080.9
## - medv 1 1.3210 45.737 -1069.7
## - age 1 2.0695 46.485 -1062.2
## - rad 1 2.5773 46.993 -1057.1
## - nox 1 8.0243 52.440 -1006.0
##
## Step: AIC=-1082.06
## target ~ nox + age + rad + ptratio + medv
##
## Df Sum of Sq RSS AIC
## <none> 44.541 -1082.1
## - ptratio 1 0.2101 44.751 -1081.9
## - medv 1 1.5678 46.109 -1067.9
## - age 1 2.0560 46.597 -1063.0
## - rad 1 5.1739 49.715 -1032.8
## - nox 1 7.9634 52.505 -1007.4
##
## Call:
## lm(formula = target ~ nox + age + rad + ptratio + medv, data = train)
##
## Coefficients:
## (Intercept) nox age rad ptratio medv
## -1.412836 1.956694 0.003532 0.017107 0.012716 0.008021