Exercise 8.3 considers a model that predicts a newborn’s weight using several predictors (gestation length, parity, age of mother, height of mother, weight of mother,smoking status of mother). The table below shows the adjusted R-squared for the full model as well as adjusted R-squared values for all models we evaluate in the first step of the backwards elimination process.
Which, if any, variable should be removed from the model first?
library(knitr)
Model<-c("Full_model","No_gestation","No_parity","No_age","No_height","No_weight","No_smoking_status")
R_squared<-c(0.2541,0.1031,0.2492,0.2547,0.2311,0.2536,0.2072)
df<-data.frame(Model,R_squared)
kable(df[rev(order(df$R_squared)),])
Model | R_squared | |
---|---|---|
4 | No_age | 0.2547 |
1 | Full_model | 0.2541 |
6 | No_weight | 0.2536 |
3 | No_parity | 0.2492 |
5 | No_height | 0.2311 |
7 | No_smoking_status | 0.2072 |
2 | No_gestation | 0.1031 |
We notice that 6 out of 7 models have their \(R^2\) between 20% and 25.5%: \(20\)% < \(R^2\) < \(25.5\)% and 1 around 10%.
The variables involved in the given list look non-time series variables (nonstationary - better when close to 1), meaning the R-squared of each variable is better considered if 25% or less.
Therefore, we can conclude from the reversed sorted list that “No age” model has the highest \(R^2_{adj}\) value that should be removed for better \(R^2\) results.