Forward selection

Forward selection begins with a model which includes no predictors (the intercept only model). Variables are then added to the model one by one until no remaining variables improve the model by a certain criterion. At each step, the variable showing the biggest improvement to the model is added. Once a variable is in the model, it remains there.

The function stepAIC() can also be used to conduct forward selection. For the birth weight example, the R code is shown below. Note that forward selection stops when the AIC would decrease after adding a predictor.

library(MASS)
data(birthwt)
 
 null<-lm(bwt~ 1, data=birthwt) # 1 here means the intercept 
 full<-lm(bwt~ lwt+race+smoke+ptl+ht+ui+ftv, data=birthwt)
 
 stepAIC(null, scope=list(lower=null, upper=full), 
           data=birthwt, direction='forward')
## Start:  AIC=2492.76
## bwt ~ 1
## 
##         Df Sum of Sq      RSS    AIC
## + ui     1   8059031 91910625 2478.9
## + race   1   3790184 96179472 2487.5
## + smoke  1   3625946 96343710 2487.8
## + lwt    1   3448639 96521017 2488.1
## + ptl    1   2391041 97578614 2490.2
## + ht     1   2130425 97839231 2490.7
## <none>               99969656 2492.8
## + ftv    1    339993 99629663 2494.1
## 
## Step:  AIC=2478.88
## bwt ~ ui
## 
##         Df Sum of Sq      RSS    AIC
## + race   1   3230127 88680498 2474.1
## + ht     1   3162595 88748030 2474.3
## + smoke  1   2996636 88913988 2474.6
## + lwt    1   2074421 89836203 2476.6
## <none>               91910625 2478.9
## + ptl    1    854664 91055961 2479.1
## + ftv    1    172098 91738526 2480.5
## 
## Step:  AIC=2474.11
## bwt ~ ui + race
## 
##         Df Sum of Sq      RSS    AIC
## + smoke  1   6253241 82427257 2462.3
## + ht     1   3000965 85679533 2469.6
## + lwt    1   1367676 87312822 2473.2
## <none>               88680498 2474.1
## + ptl    1    869259 87811239 2474.2
## + ftv    1     59737 88620761 2476.0
## 
## Step:  AIC=2462.29
## bwt ~ ui + race + smoke
## 
##        Df Sum of Sq      RSS    AIC
## + ht    1   2739963 79687294 2457.9
## + lwt   1    868170 81559088 2462.3
## <none>              82427257 2462.3
## + ptl   1    220563 82206694 2463.8
## + ftv   1      8390 82418867 2464.3
## 
## Step:  AIC=2457.9
## bwt ~ ui + race + smoke + ht
## 
##        Df Sum of Sq      RSS    AIC
## + lwt   1   1846738 77840556 2455.5
## <none>              79687294 2457.9
## + ptl   1    214476 79472818 2459.4
## + ftv   1      1134 79686160 2459.9
## 
## Step:  AIC=2455.47
## bwt ~ ui + race + smoke + ht + lwt
## 
##        Df Sum of Sq      RSS    AIC
## <none>              77840556 2455.5
## + ptl   1    108936 77731620 2457.2
## + ftv   1     49231 77791325 2457.3
## 
## Call:
## lm(formula = bwt ~ ui + race + smoke + ht + lwt, data = birthwt)
## 
## Coefficients:
## (Intercept)           ui         race        smoke           ht  
##    3104.438     -523.419     -187.849     -366.135     -595.820  
##         lwt  
##       3.434

Backward elimination

Backward elimination begins with a model which includes all candidate variables. Variables are then deleted from the model one by one until all the variables remaining in the model are significant and exceed certain criteria. At each step, the variable showing the smallest improvement to the model is deleted. Once a variable is deleted, it cannot come back to the model.

The R package MASS has a function stepAIC() that can be used to conduct backward elimination. To use the function, one first needs to define a null model and a full model. The null model is typically a model without any predictors (the intercept only model) and the full model is often the one with all the candidate predictors included. For the birth weight example, the R code is shown below. Note that backward elimination is based on AIC. It stops when the AIC would increase after removing a predictor.

library(MASS)
data(birthwt)

 null<-lm(bwt~ 1, data=birthwt) # 1 here means the intercept 
 
 full<-lm(bwt~ lwt+race+smoke+ptl+ht+ui+ftv, data=birthwt)
 
 stepAIC(full, scope=list(lower=null, upper=full), data=birthwt, direction='backward')
## Start:  AIC=2459.09
## bwt ~ lwt + race + smoke + ptl + ht + ui + ftv
## 
##         Df Sum of Sq      RSS    AIC
## - ftv    1     50343 77731620 2457.2
## - ptl    1    110047 77791325 2457.3
## <none>               77681278 2459.1
## - lwt    1   1789656 79470934 2461.4
## - ht     1   3731126 81412404 2465.9
## - race   1   4707970 82389248 2468.2
## - smoke  1   4843734 82525012 2468.5
## - ui     1   5749594 83430871 2470.6
## 
## Step:  AIC=2457.21
## bwt ~ lwt + race + smoke + ptl + ht + ui
## 
##         Df Sum of Sq      RSS    AIC
## - ptl    1    108936 77840556 2455.5
## <none>               77731620 2457.2
## - lwt    1   1741198 79472818 2459.4
## - ht     1   3681167 81412788 2463.9
## - race   1   4660187 82391807 2466.2
## - smoke  1   4810582 82542203 2466.6
## - ui     1   5716074 83447695 2468.6
## 
## Step:  AIC=2455.47
## bwt ~ lwt + race + smoke + ht + ui
## 
##         Df Sum of Sq      RSS    AIC
## <none>               77840556 2455.5
## - lwt    1   1846738 79687294 2457.9
## - ht     1   3718531 81559088 2462.3
## - race   1   4727071 82567628 2464.6
## - smoke  1   5237430 83077987 2465.8
## - ui     1   6302771 84143327 2468.2
## 
## Call:
## lm(formula = bwt ~ lwt + race + smoke + ht + ui, data = birthwt)
## 
## Coefficients:
## (Intercept)          lwt         race        smoke           ht  
##    3104.438        3.434     -187.849     -366.135     -595.820  
##          ui  
##    -523.419

CONCLUSION

If you have a very large set of candidate predictors from which you wish to extract a few–i.e., if you’re on a fishing expedition–you should generally go forward. If, on the other hand, if you have a modest-sized set of potential variables from which you wish to eliminate a few–i.e., if you’re fine-tuning some prior selection of variables–you should generally go backward. If you’re on a fishing expedition, you should still be careful not to cast too wide a net, selecting variables that are only accidentally related to your dependent variable.