Lab 8

Sorting

Let’s say you have a dataframe as follows:

data(iris)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

How can we easily find the flower with the longest petal?

order()

order() takes in a vector and returns the sorted indices in ascending order.

order(c(3, 1, 2))
[1] 2 3 1

You can also set decreasing = TRUE to sort it in descending order

order(c(3, 1, 2), decreasing = TRUE)
[1] 1 3 2

Sorting a Dataframe by a column

First, find the descending sort order of the petal lengths.

sort_order <- order(iris$Petal.Length, decreasing = TRUE)

As of now these numbers don’t mean that much to us. So let’s put it back in the original data frame.

iris[sort_order, ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
119          7.7         2.6          6.9         2.3  virginica
118          7.7         3.8          6.7         2.2  virginica
123          7.7         2.8          6.7         2.0  virginica
106          7.6         3.0          6.6         2.1  virginica
132          7.9         3.8          6.4         2.0  virginica
108          7.3         2.9          6.3         1.8  virginica
110          7.2         3.6          6.1         2.5  virginica
131          7.4         2.8          6.1         1.9  virginica
136          7.7         3.0          6.1         2.3  virginica
101          6.3         3.3          6.0         2.5  virginica
126          7.2         3.2          6.0         1.8  virginica
103          7.1         3.0          5.9         2.1  virginica
144          6.8         3.2          5.9         2.3  virginica
105          6.5         3.0          5.8         2.2  virginica
109          6.7         2.5          5.8         1.8  virginica
130          7.2         3.0          5.8         1.6  virginica
121          6.9         3.2          5.7         2.3  virginica
125          6.7         3.3          5.7         2.1  virginica
145          6.7         3.3          5.7         2.5  virginica
104          6.3         2.9          5.6         1.8  virginica
129          6.4         2.8          5.6         2.1  virginica
133          6.4         2.8          5.6         2.2  virginica
135          6.1         2.6          5.6         1.4  virginica
137          6.3         3.4          5.6         2.4  virginica
141          6.7         3.1          5.6         2.4  virginica
113          6.8         3.0          5.5         2.1  virginica
117          6.5         3.0          5.5         1.8  virginica
138          6.4         3.1          5.5         1.8  virginica
140          6.9         3.1          5.4         2.1  virginica
149          6.2         3.4          5.4         2.3  virginica
112          6.4         2.7          5.3         1.9  virginica
116          6.4         3.2          5.3         2.3  virginica
146          6.7         3.0          5.2         2.3  virginica
148          6.5         3.0          5.2         2.0  virginica
84           6.0         2.7          5.1         1.6 versicolor
102          5.8         2.7          5.1         1.9  virginica
111          6.5         3.2          5.1         2.0  virginica
115          5.8         2.8          5.1         2.4  virginica
134          6.3         2.8          5.1         1.5  virginica
142          6.9         3.1          5.1         2.3  virginica
143          5.8         2.7          5.1         1.9  virginica
150          5.9         3.0          5.1         1.8  virginica
78           6.7         3.0          5.0         1.7 versicolor
114          5.7         2.5          5.0         2.0  virginica
120          6.0         2.2          5.0         1.5  virginica
147          6.3         2.5          5.0         1.9  virginica
53           6.9         3.1          4.9         1.5 versicolor
73           6.3         2.5          4.9         1.5 versicolor
122          5.6         2.8          4.9         2.0  virginica
124          6.3         2.7          4.9         1.8  virginica
128          6.1         3.0          4.9         1.8  virginica
71           5.9         3.2          4.8         1.8 versicolor
77           6.8         2.8          4.8         1.4 versicolor
127          6.2         2.8          4.8         1.8  virginica
139          6.0         3.0          4.8         1.8  virginica
51           7.0         3.2          4.7         1.4 versicolor
57           6.3         3.3          4.7         1.6 versicolor
64           6.1         2.9          4.7         1.4 versicolor
74           6.1         2.8          4.7         1.2 versicolor
87           6.7         3.1          4.7         1.5 versicolor
55           6.5         2.8          4.6         1.5 versicolor
59           6.6         2.9          4.6         1.3 versicolor
92           6.1         3.0          4.6         1.4 versicolor
52           6.4         3.2          4.5         1.5 versicolor
56           5.7         2.8          4.5         1.3 versicolor
67           5.6         3.0          4.5         1.5 versicolor
69           6.2         2.2          4.5         1.5 versicolor
79           6.0         2.9          4.5         1.5 versicolor
85           5.4         3.0          4.5         1.5 versicolor
86           6.0         3.4          4.5         1.6 versicolor
107          4.9         2.5          4.5         1.7  virginica
66           6.7         3.1          4.4         1.4 versicolor
76           6.6         3.0          4.4         1.4 versicolor
88           6.3         2.3          4.4         1.3 versicolor
91           5.5         2.6          4.4         1.2 versicolor
75           6.4         2.9          4.3         1.3 versicolor
98           6.2         2.9          4.3         1.3 versicolor
62           5.9         3.0          4.2         1.5 versicolor
95           5.6         2.7          4.2         1.3 versicolor
96           5.7         3.0          4.2         1.2 versicolor
97           5.7         2.9          4.2         1.3 versicolor
68           5.8         2.7          4.1         1.0 versicolor
89           5.6         3.0          4.1         1.3 versicolor
100          5.7         2.8          4.1         1.3 versicolor
54           5.5         2.3          4.0         1.3 versicolor
63           6.0         2.2          4.0         1.0 versicolor
72           6.1         2.8          4.0         1.3 versicolor
90           5.5         2.5          4.0         1.3 versicolor
93           5.8         2.6          4.0         1.2 versicolor
60           5.2         2.7          3.9         1.4 versicolor
70           5.6         2.5          3.9         1.1 versicolor
83           5.8         2.7          3.9         1.2 versicolor
81           5.5         2.4          3.8         1.1 versicolor
82           5.5         2.4          3.7         1.0 versicolor
65           5.6         2.9          3.6         1.3 versicolor
61           5.0         2.0          3.5         1.0 versicolor
80           5.7         2.6          3.5         1.0 versicolor
58           4.9         2.4          3.3         1.0 versicolor
94           5.0         2.3          3.3         1.0 versicolor
99           5.1         2.5          3.0         1.1 versicolor
25           4.8         3.4          1.9         0.2     setosa
45           5.1         3.8          1.9         0.4     setosa
6            5.4         3.9          1.7         0.4     setosa
19           5.7         3.8          1.7         0.3     setosa
21           5.4         3.4          1.7         0.2     setosa
24           5.1         3.3          1.7         0.5     setosa
12           4.8         3.4          1.6         0.2     setosa
26           5.0         3.0          1.6         0.2     setosa
27           5.0         3.4          1.6         0.4     setosa
30           4.7         3.2          1.6         0.2     setosa
31           4.8         3.1          1.6         0.2     setosa
44           5.0         3.5          1.6         0.6     setosa
47           5.1         3.8          1.6         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
8            5.0         3.4          1.5         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
11           5.4         3.7          1.5         0.2     setosa
16           5.7         4.4          1.5         0.4     setosa
20           5.1         3.8          1.5         0.3     setosa
22           5.1         3.7          1.5         0.4     setosa
28           5.2         3.5          1.5         0.2     setosa
32           5.4         3.4          1.5         0.4     setosa
33           5.2         4.1          1.5         0.1     setosa
35           4.9         3.1          1.5         0.2     setosa
40           5.1         3.4          1.5         0.2     setosa
49           5.3         3.7          1.5         0.2     setosa
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
7            4.6         3.4          1.4         0.3     setosa
9            4.4         2.9          1.4         0.2     setosa
13           4.8         3.0          1.4         0.1     setosa
18           5.1         3.5          1.4         0.3     setosa
29           5.2         3.4          1.4         0.2     setosa
34           5.5         4.2          1.4         0.2     setosa
38           4.9         3.6          1.4         0.1     setosa
46           4.8         3.0          1.4         0.3     setosa
48           4.6         3.2          1.4         0.2     setosa
50           5.0         3.3          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
17           5.4         3.9          1.3         0.4     setosa
37           5.5         3.5          1.3         0.2     setosa
39           4.4         3.0          1.3         0.2     setosa
41           5.0         3.5          1.3         0.3     setosa
42           4.5         2.3          1.3         0.3     setosa
43           4.4         3.2          1.3         0.2     setosa
15           5.8         4.0          1.2         0.2     setosa
36           5.0         3.2          1.2         0.2     setosa
14           4.3         3.0          1.1         0.1     setosa
23           4.6         3.6          1.0         0.2     setosa

Or, in one line of code

iris[order(iris$Petal.Length, decreasing = TRUE), ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
119          7.7         2.6          6.9         2.3  virginica
118          7.7         3.8          6.7         2.2  virginica
123          7.7         2.8          6.7         2.0  virginica
106          7.6         3.0          6.6         2.1  virginica
132          7.9         3.8          6.4         2.0  virginica
108          7.3         2.9          6.3         1.8  virginica
110          7.2         3.6          6.1         2.5  virginica
131          7.4         2.8          6.1         1.9  virginica
136          7.7         3.0          6.1         2.3  virginica
101          6.3         3.3          6.0         2.5  virginica
126          7.2         3.2          6.0         1.8  virginica
103          7.1         3.0          5.9         2.1  virginica
144          6.8         3.2          5.9         2.3  virginica
105          6.5         3.0          5.8         2.2  virginica
109          6.7         2.5          5.8         1.8  virginica
130          7.2         3.0          5.8         1.6  virginica
121          6.9         3.2          5.7         2.3  virginica
125          6.7         3.3          5.7         2.1  virginica
145          6.7         3.3          5.7         2.5  virginica
104          6.3         2.9          5.6         1.8  virginica
129          6.4         2.8          5.6         2.1  virginica
133          6.4         2.8          5.6         2.2  virginica
135          6.1         2.6          5.6         1.4  virginica
137          6.3         3.4          5.6         2.4  virginica
141          6.7         3.1          5.6         2.4  virginica
113          6.8         3.0          5.5         2.1  virginica
117          6.5         3.0          5.5         1.8  virginica
138          6.4         3.1          5.5         1.8  virginica
140          6.9         3.1          5.4         2.1  virginica
149          6.2         3.4          5.4         2.3  virginica
112          6.4         2.7          5.3         1.9  virginica
116          6.4         3.2          5.3         2.3  virginica
146          6.7         3.0          5.2         2.3  virginica
148          6.5         3.0          5.2         2.0  virginica
84           6.0         2.7          5.1         1.6 versicolor
102          5.8         2.7          5.1         1.9  virginica
111          6.5         3.2          5.1         2.0  virginica
115          5.8         2.8          5.1         2.4  virginica
134          6.3         2.8          5.1         1.5  virginica
142          6.9         3.1          5.1         2.3  virginica
143          5.8         2.7          5.1         1.9  virginica
150          5.9         3.0          5.1         1.8  virginica
78           6.7         3.0          5.0         1.7 versicolor
114          5.7         2.5          5.0         2.0  virginica
120          6.0         2.2          5.0         1.5  virginica
147          6.3         2.5          5.0         1.9  virginica
53           6.9         3.1          4.9         1.5 versicolor
73           6.3         2.5          4.9         1.5 versicolor
122          5.6         2.8          4.9         2.0  virginica
124          6.3         2.7          4.9         1.8  virginica
128          6.1         3.0          4.9         1.8  virginica
71           5.9         3.2          4.8         1.8 versicolor
77           6.8         2.8          4.8         1.4 versicolor
127          6.2         2.8          4.8         1.8  virginica
139          6.0         3.0          4.8         1.8  virginica
51           7.0         3.2          4.7         1.4 versicolor
57           6.3         3.3          4.7         1.6 versicolor
64           6.1         2.9          4.7         1.4 versicolor
74           6.1         2.8          4.7         1.2 versicolor
87           6.7         3.1          4.7         1.5 versicolor
55           6.5         2.8          4.6         1.5 versicolor
59           6.6         2.9          4.6         1.3 versicolor
92           6.1         3.0          4.6         1.4 versicolor
52           6.4         3.2          4.5         1.5 versicolor
56           5.7         2.8          4.5         1.3 versicolor
67           5.6         3.0          4.5         1.5 versicolor
69           6.2         2.2          4.5         1.5 versicolor
79           6.0         2.9          4.5         1.5 versicolor
85           5.4         3.0          4.5         1.5 versicolor
86           6.0         3.4          4.5         1.6 versicolor
107          4.9         2.5          4.5         1.7  virginica
66           6.7         3.1          4.4         1.4 versicolor
76           6.6         3.0          4.4         1.4 versicolor
88           6.3         2.3          4.4         1.3 versicolor
91           5.5         2.6          4.4         1.2 versicolor
75           6.4         2.9          4.3         1.3 versicolor
98           6.2         2.9          4.3         1.3 versicolor
62           5.9         3.0          4.2         1.5 versicolor
95           5.6         2.7          4.2         1.3 versicolor
96           5.7         3.0          4.2         1.2 versicolor
97           5.7         2.9          4.2         1.3 versicolor
68           5.8         2.7          4.1         1.0 versicolor
89           5.6         3.0          4.1         1.3 versicolor
100          5.7         2.8          4.1         1.3 versicolor
54           5.5         2.3          4.0         1.3 versicolor
63           6.0         2.2          4.0         1.0 versicolor
72           6.1         2.8          4.0         1.3 versicolor
90           5.5         2.5          4.0         1.3 versicolor
93           5.8         2.6          4.0         1.2 versicolor
60           5.2         2.7          3.9         1.4 versicolor
70           5.6         2.5          3.9         1.1 versicolor
83           5.8         2.7          3.9         1.2 versicolor
81           5.5         2.4          3.8         1.1 versicolor
82           5.5         2.4          3.7         1.0 versicolor
65           5.6         2.9          3.6         1.3 versicolor
61           5.0         2.0          3.5         1.0 versicolor
80           5.7         2.6          3.5         1.0 versicolor
58           4.9         2.4          3.3         1.0 versicolor
94           5.0         2.3          3.3         1.0 versicolor
99           5.1         2.5          3.0         1.1 versicolor
25           4.8         3.4          1.9         0.2     setosa
45           5.1         3.8          1.9         0.4     setosa
6            5.4         3.9          1.7         0.4     setosa
19           5.7         3.8          1.7         0.3     setosa
21           5.4         3.4          1.7         0.2     setosa
24           5.1         3.3          1.7         0.5     setosa
12           4.8         3.4          1.6         0.2     setosa
26           5.0         3.0          1.6         0.2     setosa
27           5.0         3.4          1.6         0.4     setosa
30           4.7         3.2          1.6         0.2     setosa
31           4.8         3.1          1.6         0.2     setosa
44           5.0         3.5          1.6         0.6     setosa
47           5.1         3.8          1.6         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
8            5.0         3.4          1.5         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
11           5.4         3.7          1.5         0.2     setosa
16           5.7         4.4          1.5         0.4     setosa
20           5.1         3.8          1.5         0.3     setosa
22           5.1         3.7          1.5         0.4     setosa
28           5.2         3.5          1.5         0.2     setosa
32           5.4         3.4          1.5         0.4     setosa
33           5.2         4.1          1.5         0.1     setosa
35           4.9         3.1          1.5         0.2     setosa
40           5.1         3.4          1.5         0.2     setosa
49           5.3         3.7          1.5         0.2     setosa
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
7            4.6         3.4          1.4         0.3     setosa
9            4.4         2.9          1.4         0.2     setosa
13           4.8         3.0          1.4         0.1     setosa
18           5.1         3.5          1.4         0.3     setosa
29           5.2         3.4          1.4         0.2     setosa
34           5.5         4.2          1.4         0.2     setosa
38           4.9         3.6          1.4         0.1     setosa
46           4.8         3.0          1.4         0.3     setosa
48           4.6         3.2          1.4         0.2     setosa
50           5.0         3.3          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
17           5.4         3.9          1.3         0.4     setosa
37           5.5         3.5          1.3         0.2     setosa
39           4.4         3.0          1.3         0.2     setosa
41           5.0         3.5          1.3         0.3     setosa
42           4.5         2.3          1.3         0.3     setosa
43           4.4         3.2          1.3         0.2     setosa
15           5.8         4.0          1.2         0.2     setosa
36           5.0         3.2          1.2         0.2     setosa
14           4.3         3.0          1.1         0.1     setosa
23           4.6         3.6          1.0         0.2     setosa

Model Selection

Why is Model Selection Important?

So far, we’ve learned how to create multiple linear regression models and compare them to each other using the general linear F-test to decide which one to keep.

df <- read.csv("older_adults.csv")
model_r <- lm(TuG ~ Age, df)
model_f <- lm(TuG ~ Age + Weight + MoCA + Fear_falling, df)
anova(model_r, model_f)
Analysis of Variance Table

Model 1: TuG ~ Age
Model 2: TuG ~ Age + Weight + MoCA + Fear_falling
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     42 248.39                              
2     39 211.36  3    37.029 2.2775 0.09475 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the above analyses we can tell that it would be better to keep the full model over the reduced. But can we be certain that the full model is the best model to use in this case?

Model Selection Approaches

One way that we can be certain if we have the best model or not is to compare some sort of metric ( \(p\), \(r^2\), etc.) across all combinations of predictors to see which one has the best one. For example if we had two predictors— \(x_1\) and \(x_2\) —then these are the models that we could compare.

  • No predictors

  • \(x_1\) only

  • \(x_2\) only

  • \(x_1\) and \(x_2\)

Given that there are only four possible models, it would be easy to compare all of them by hand. But what if we had three predictors? four predictors? even more?

olsrr

For this section, we will primarily be working with functions from the olsrr package. Install it in R then import it.

library(olsrr)

All Possible Regression

Fortunately, R has a way to automatically compare every possible model for us. This is called all possible regression and can be done using the ols_step_all_possible() function.

ols_step_all_possible(model_f)
   Index N                   Predictors    R-Square Adj. R-Square Mallow's Cp
1      1 1                          Age 0.269035144    0.25163122    5.832371
4      2 1                 Fear_falling 0.158755366    0.13872573   12.747045
3      3 1                         MoCA 0.083315782    0.06148997   17.477198
2      4 1                       Weight 0.001426968   -0.02234858   22.611724
7      5 2             Age Fear_falling 0.352244530    0.32064670    2.615043
6      6 2                     Age MoCA 0.283362197    0.24840426    6.934048
5      7 2                   Age Weight 0.282418949    0.24741500    6.993191
10     8 2            MoCA Fear_falling 0.221160195    0.18316801   10.834188
9      9 2          Weight Fear_falling 0.159847588    0.11886454   14.678562
8     10 2                  Weight MoCA 0.089892274    0.04549678   19.064844
13    11 3        Age MoCA Fear_falling 0.364844103    0.31720741    3.825035
12    12 3      Age Weight Fear_falling 0.362694755    0.31489686    3.959802
11    13 3              Age Weight MoCA 0.299965268    0.24746266    7.893016
14    14 3     Weight MoCA Fear_falling 0.226209875    0.16817562   12.517567
15    15 4 Age Weight MoCA Fear_falling 0.378002304    0.31420767    5.000000

This allows to automatically generate the \(r^2\) , \(r^2_{adj}\) , and Mallow’s \(C_p\) for all combinations of predictors ordered from least number to greatest.

Choosing a Criterion

Let’s use what we just learned about sorting a dataframe to easily find the best combination of predictors.

result <- ols_step_all_possible(model_f)$result
result[order(result$adjr, decreasing = TRUE), c("predictors", "adjr")]
                     predictors        adjr
7              Age Fear_falling  0.32064670
13        Age MoCA Fear_falling  0.31720741
12      Age Weight Fear_falling  0.31489686
15 Age Weight MoCA Fear_falling  0.31420767
1                           Age  0.25163122
6                      Age MoCA  0.24840426
11              Age Weight MoCA  0.24746266
5                    Age Weight  0.24741500
10            MoCA Fear_falling  0.18316801
14     Weight MoCA Fear_falling  0.16817562
4                  Fear_falling  0.13872573
9           Weight Fear_falling  0.11886454
3                          MoCA  0.06148997
8                   Weight MoCA  0.04549678
2                        Weight -0.02234858

We could also sort by other criterion as well, such as the akaike information criterion

result <- ols_step_all_possible(model_f)$result
result[order(result$aic), c("predictors", "aic")]
                     predictors      aic
7              Age Fear_falling 203.7052
13        Age MoCA Fear_falling 204.8409
12      Age Weight Fear_falling 204.9895
15 Age Weight MoCA Fear_falling 205.9198
1                           Age 207.0227
6                      Age MoCA 208.1517
5                    Age Weight 208.2096
11              Age Weight MoCA 209.1203
10            MoCA Fear_falling 211.8140
4                  Fear_falling 213.2054
14     Weight MoCA Fear_falling 213.5278
9           Weight Fear_falling 215.1483
3                          MoCA 216.9842
8                   Weight MoCA 218.6674
2                        Weight 220.7490

Best Subset Regression

all possible gives us a lot of options but oftentimes we only care about the best one for each number of predictors. olsrr has another function that does just this.

ols_step_best_subset(model_f)
          Best Subsets Regression          
-------------------------------------------
Model Index    Predictors
-------------------------------------------
     1         Age                          
     2         Age Fear_falling             
     3         Age MoCA Fear_falling        
     4         Age Weight MoCA Fear_falling 
-------------------------------------------

                                                   Subsets Regression Summary                                                   
--------------------------------------------------------------------------------------------------------------------------------
                       Adj.        Pred                                                                                          
Model    R-Square    R-Square    R-Square     C(p)       AIC        SBIC        SBC         MSEP       FPE       HSP       APC  
--------------------------------------------------------------------------------------------------------------------------------
  1        0.2690      0.2516        0.18    5.8324    207.0227    81.9930    212.3752    260.2336    6.1829    0.1442    0.8006 
  2        0.3522      0.3206      0.2197    2.6150    203.7052    79.3247    210.8419    236.3752    5.7347    0.1342    0.7425 
  3        0.3648      0.3172      0.1718    3.8250    204.8409    80.7910    213.7618    237.7204    5.8864    0.1384    0.7622 
  4        0.3780      0.3142       0.113    5.0000    205.9198    82.3024    216.6249    238.9219    6.0354    0.1426    0.7815 
--------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria 
 SBIC: Sawa's Bayesian Information Criteria 
 SBC: Schwarz Bayesian Criteria 
 MSEP: Estimated error of prediction, assuming multivariate normality 
 FPE: Final Prediction Error 
 HSP: Hocking's Sp 
 APC: Amemiya Prediction Criteria 

Although this output is fairly readable as is, we can also sort by whatever criteria we want

result <- ols_step_best_subset(model_f)$metrics
result[order(result$adjr), c("predictors", "adjr")]
                    predictors      adjr
1                          Age 0.2516312
4 Age Weight MoCA Fear_falling 0.3142077
3        Age MoCA Fear_falling 0.3172074
2             Age Fear_falling 0.3206467

Automatic Model Selection

All possible and best subset is good for selecting models when you only have a few predictors, but what if you have a dataset with much larger numbers of predictors?

model <- lm(TuG ~ ., df)
summary(model)

Call:
lm(formula = TuG ~ ., data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6921 -1.0600 -0.1535  0.9410  3.9243 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)        13.04472   19.05084   0.685   0.4989  
PPID                0.01023    0.02676   0.382   0.7051  
Age                 0.14419    0.06089   2.368   0.0248 *
Sex                 0.90410    1.11746   0.809   0.4251  
Height             -0.03130    0.06447  -0.485   0.6310  
Weight              0.06184    0.02952   2.095   0.0450 *
Falls              -0.00780    0.36194  -0.022   0.9830  
Balance            -0.36616    0.16608  -2.205   0.0356 *
MoCA                0.12318    0.13125   0.938   0.3557  
Concern_falling     0.12766    0.08276   1.543   0.1338  
Fear_falling        0.01513    0.02039   0.742   0.4641  
Balance_confidence  0.05290    0.03241   1.632   0.1135  
Conc_Mvmt_Proc      0.10620    0.10375   1.024   0.3145  
Sway               -0.45837    0.27464  -1.669   0.1059  
Stability          -0.04630    0.02566  -1.805   0.0815 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.895 on 29 degrees of freedom
Multiple R-squared:  0.6937,    Adjusted R-squared:  0.5458 
F-statistic: 4.691 on 14 and 29 DF,  p-value: 0.000217

Forward Selection

Forward selection is a stepwise method to build a predictive model by starting with no independent variables. At each step, the algorithm evaluates all potential predictors and adds the one that most improves the model, based on criteria like the lowest p-value or the greatest increase in adjusted R². This process continues until no remaining variables significantly improve the model.

This code will run forward selection on our model to produce the best model to predict TuG.

ols_step_forward_p(model)

                               Stepwise Summary                                
-----------------------------------------------------------------------------
Step    Variable             AIC        SBC       SBIC       R2       Adj. R2 
-----------------------------------------------------------------------------
 0      Base Model         218.812    222.380    92.302    0.00000    0.00000 
 1      Balance            192.134    197.486    66.811    0.47888    0.46647 
 2      Sway               189.778    196.915    64.819    0.52800    0.50497 
 3      Age                187.654    196.575    63.425    0.57022    0.53799 
 4      Weight             186.840    197.545    63.454    0.59685    0.55550 
 5      Stability          186.398    198.888    64.059    0.61861    0.56843 
 6      Concern_falling    186.965    201.238    65.589    0.63084    0.57097 
-----------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.794       RMSE                 1.689 
R-Squared               0.631       MSE                  2.851 
Adj. R-Squared          0.571       Coef. Var           17.252 
Pred R-Squared          0.337       AIC                186.965 
MAE                     1.396       SBC                201.238 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                
-------------------------------------------------------------------
               Sum of                                              
              Squares        DF    Mean Square      F         Sig. 
-------------------------------------------------------------------
Regression    214.368         6         35.728    10.538    0.0000 
Residual      125.446        37          3.390                     
Total         339.814        43                                    
-------------------------------------------------------------------

                                    Parameter Estimates                                      
--------------------------------------------------------------------------------------------
          model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
--------------------------------------------------------------------------------------------
    (Intercept)    27.774         9.958                  2.789    0.008     7.598    47.951 
        Balance    -0.464         0.133       -0.442    -3.484    0.001    -0.734    -0.194 
           Sway    -0.611         0.230       -0.295    -2.657    0.012    -1.078    -0.145 
            Age     0.103         0.048        0.255     2.134    0.040     0.005     0.201 
         Weight     0.038         0.021        0.209     1.840    0.074    -0.004     0.080 
      Stability    -0.021         0.014       -0.162    -1.488    0.145    -0.049     0.008 
Concern_falling     0.074         0.067        0.119     1.107    0.275    -0.062     0.210 
--------------------------------------------------------------------------------------------

Backward Elimination

Backward elimination starts with a full model that includes all candidate predictors. At each step, the variable that contributes the least—typically the one with the highest p-value—is removed. This process repeats until all remaining variables contribute effectively to the model.

ols_step_backward_p(model)

                               Stepwise Summary                               
----------------------------------------------------------------------------
Step    Variable            AIC        SBC       SBIC       R2       Adj. R2 
----------------------------------------------------------------------------
 0      Full Model        194.752    223.299    84.867    0.69370    0.54583 
 1      Falls             192.752    219.515    81.833    0.69369    0.56096 
 2      PPID              190.974    215.952    78.837    0.69215    0.57298 
 3      Height            189.282    212.476    75.895    0.68998    0.58342 
 4      MoCA              188.276    209.686    73.219    0.68290    0.58681 
 5      Conc_Mvmt_Proc    187.173    206.799    70.646    0.67637    0.59070 
----------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.822       RMSE                 1.581 
R-Squared               0.676       MSE                  2.499 
Adj. R-Squared          0.591       Coef. Var           16.851 
Pred R-Squared          0.336       AIC                187.173 
MAE                     1.305       SBC                206.799 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                              ANOVA                                
------------------------------------------------------------------
               Sum of                                             
              Squares        DF    Mean Square      F        Sig. 
------------------------------------------------------------------
Regression    229.840         9         25.538    7.895    0.0000 
Residual      109.974        34          3.235                    
Total         339.814        43                                   
------------------------------------------------------------------

                                      Parameter Estimates                                        
------------------------------------------------------------------------------------------------
             model      Beta    Std. Error    Std. Beta      t        Sig       lower     upper 
------------------------------------------------------------------------------------------------
       (Intercept)    11.065        12.735                  0.869    0.391    -14.815    36.944 
               Age     0.147         0.052        0.364     2.808    0.008      0.041     0.253 
               Sex     1.087         0.876        0.178     1.240    0.223     -0.694     2.867 
            Weight     0.052         0.025        0.283     2.058    0.047      0.001     0.102 
           Balance    -0.329         0.145       -0.313    -2.272    0.030     -0.623    -0.035 
   Concern_falling     0.132         0.076        0.211     1.730    0.093     -0.023     0.287 
      Fear_falling     0.027         0.016        0.238     1.682    0.102     -0.006     0.060 
Balance_confidence     0.057         0.028        0.480     2.035    0.050      0.000     0.113 
              Sway    -0.459         0.235       -0.222    -1.951    0.059     -0.938     0.019 
         Stability    -0.051         0.023       -0.398    -2.262    0.030     -0.097    -0.005 
------------------------------------------------------------------------------------------------

Understanding Forward and Backwards Selection

As is, the output may be a bit hard to understand. Let’s visualize both of these methods to see what it is giving us.

First, forward selection:

plot(ols_step_forward_aic(model))

Backward selection:

plot(ols_step_backward_aic(model))

Drawbacks

We show you how to do forward and backward model selection so you have an idea of what is used in the field.

  • However, recently these methods are looked down upon because they may be misused for confirmatory data analysis when they should only be used for exploratory data analysis.

  • Wikipedia article on overfitting

  • Better alternatives, such as lasso and ridge, are popular in machine learning and are used in regression as well.

In the end, it is best to form hypotheses before the experiment and compare only those models such as by using a general linear f-test.