Building Regression Models
In Tuesdays class discussed building regression models and the different ways we can compare models. Using different models we can predict our response variable with greater accuracy. In this learning log we will talk about forward stepwise selection, backward elimination and subsets as model building techniques. Also we will discuss how to can compare models using t-tests, partial f-tests, AIC, and Mallow Cp to assess which model is the most accurate. At the end we will apply these method(s) to data we practiced with in class.
Forward Stepwise Selection
The forward stepwise regression method is starting with a small set of predictor variables and then adding and/or subtracting additional predictors until the best model is created. The most straight forward way (pun intended) to begin forward stepwise selection is by starting with a model of an intercept and the strongest predictors. We will use The most significant predictors, or those with the smallest p-values to create a SLR model for each predictor. With our baseline model we can move forward with adding stronger predictors. Add only 1 predictor at a time and check to see if your model has improved. We can check this using the t-test, F-test and AIC. Usually we need to create a entry threshold value that new predictors will need to hurdle to be added to the model. It is best to add the next strongest predictors to avoid using unnecessary predictors. When a new predictor is added the current predictors should be re-tested for significance and an exit threshold value should be created to use. If a current predictor exceeds the exit threshold it should be removed. The process is completed after no new predictors to be added are significant and all of predictors in the model are just significant enough in regards to entering the exit thresholds.
Backward Elimination
The backward elimination method is literally just the opposite of the forward stepwise selection.
All Subsets
Our last method is all of the subsets. For this method, a different model will be created for every single possible combination of the listed predictors. So if we have y1, y2, and y3 as our predictors our models will be created under the subsets method-> y1 y2 y3 y1 + y2 y1 + y3 y2 + y3 y1 + y2 + y3 All these models will be compared against each other and the model with the highest accuracy in predicting the response will be selected.
Comparing Models
To compare models we can use the T-test, Partial F-test, AIC or Mallow Cp’s
T-test
When we want to add and/or remove predictors to our model we can test the significance of that predictor with a t-test at a certain alpha level. This significance level will be our significance threshold for entry or exit to and or from the model. We only use the t-test for new predictors added to the model or current predictors that are removed. We can only use T-tests for nested models.
Partial F-test
In MLR situations when we have t-tests we also can use partial F-tests. Like t-tests we can use the partial f-test to see if the predictors added or subtracted from the model are significant enough to enter/ exit the model at a certain given significance level. The partial F-test can let us know if these moving predictors surpass the entry significance hurdle or under the exit significance threshold. The Partial F-test also needs to be nested.
AIC
AIC stands for Akaike Information Criteria and can be calculated with the following formula: AIC=2(k+1)???2(log likelihood) AIC=2(k+1)???2(log likelihood) . On the right the first term is a count of the number of parameters which is subtracted from the fit of the model. The Smaller AIC value corresponds to a better fitted model which is our goal.Unlike the previous 2 we actually don’t need nested models for this.
Mallow Cp The next comparison is Mallow’s Cp. The k notation in the numerator is the sum of squared residuals for the nested model of k predictors while the denominator is the standard error for the full model with p predictor. In both cases, we want Cp to be relatively small meaning k+1 is our goal.
library(alr3)
## Warning: package 'alr3' was built under R version 3.3.3
## Loading required package: car
## Warning: package 'car' was built under R version 3.3.3
data(brains)
attach(brains)
mod1 <- lm(BrainWt ~ BodyWt)
plot(mod1)
There are issues with the plots. There are outlier data points that are affecting the residual, qqnorm and scale plots.
sqBrainWt <- sqrt(BrainWt)
sqmod <- lm(sqBrainWt ~ BodyWt)
plot(sqmod)
there are still the same issues as above.
logBrainWt <- log(BrainWt)
logmod <- lm(logBrainWt ~ BodyWt)
plot(logmod)
This still affects our data with data points being outliers but it appears the log function did help the appearance of the graph by a small amount.
invBrainWt <- 1/(BrainWt)
invmod <- lm(invBrainWt ~ BodyWt)
plot(invmod)
Residuals vs fitted is very concentrated at one spot and all of the qqnorm points do not follow the line closely.
brains[33, ]
## BrainWt BodyWt
## African_elephant 5711.86 6654.18
When 33 is outside outside the cones of cook’s distance line we can say it has alot of influence on the plot.