In my previous blog post I discussed about Backward elimination. In this post I will be discussing about Forward Selection using the same train dataset from assignment 3.
Forward selection typically begins with only an intercept. One tests the various variables that may be relevant, and the ‘best’ variable—where “best” is determined by some pre-determined criteria—is added to the model.
As the model continues to improve (per that same criteria) we continue the process, adding in one variable at a time and testing at each step. Once the model no longer improves with adding more variables, the process stops.
train <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/crime-training-data_modified.csv")
The MASS package has a function stepAIC() that can be used for Forward Selection.
back_simple <- lm(target ~ 1, data=train)
back_full <- lm(target ~ ., data=train)
stepAIC(back_full, scope=list(lower=back_simple, upper=back_full), data=train, direction='forward')
## Start: AIC=-1072.6
## target ~ zn + indus + chas + nox + rm + age + dis + rad + tax +
## ptratio + lstat + medv
##
## Call:
## lm(formula = target ~ zn + indus + chas + nox + rm + age + dis +
## rad + tax + ptratio + lstat + medv, data = train)
##
## Coefficients:
## (Intercept) zn indus chas nox rm
## -1.6013725 -0.0009668 0.0031277 0.0059892 1.9722476 0.0249823
## age dis rad tax ptratio lstat
## 0.0031738 0.0125382 0.0207000 -0.0002787 0.0115287 0.0045124
## medv
## 0.0089246