Variable selection procedures in R using ‘olsrr’ package. Watch the video by Mike Crowson, Ph.D. January 14, 2020, to complement what is covered in this tutorial January 14, 2020
Download .RData file here: https://drive.google.com/open?id=1_cRuM2-4dNhNxVPzgj8ddlpcrV-TAUZ2
Data are contained in a data frame named: ‘regdata’. Package information can be found here: https://cran.r-project.org/web/packages/olsrr/olsrr.pdf
if you haven’t already done so, you need to install the package
library(olsrr)
"https://drive.google.com/open?id=1_cRuM2-4dNhNxVPzgj8ddlpcrV-TAUZ2"
[1] "https://drive.google.com/open?id=1_cRuM2-4dNhNxVPzgj8ddlpcrV-TAUZ2"
regdata <- read.csv("regdata.csv")
head(regdata,5)
id perfgoal achieve mastery interest anxiety genderid masteryLMH
1 1 32.00000 6.125 5.714286 6.0 1.666667 1 3
2 2 32.25655 1.625 1.428571 4.0 6.333333 1 1
3 3 37.88265 4.500 1.285714 2.0 3.666667 1 1
4 4 58.09477 2.375 2.285714 4.0 3.666667 1 2
5 5 26.73999 5.125 4.571429 5.5 3.666667 0 3
perfgoalLMH interestMS
1 2 2
2 2 2
3 2 1
4 3 2
5 2 2
str(regdata)
'data.frame': 140 obs. of 10 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ perfgoal : num 32 32.3 37.9 58.1 26.7 ...
$ achieve : num 6.12 1.62 4.5 2.38 5.12 ...
$ mastery : num 5.71 1.43 1.29 2.29 4.57 ...
$ interest : num 6 4 2 4 5.5 4 4 5 4.5 4 ...
$ anxiety : num 1.67 6.33 3.67 3.67 3.67 ...
$ genderid : int 1 1 1 1 0 1 1 1 1 1 ...
$ masteryLMH : int 3 1 1 2 3 2 2 2 2 2 ...
$ perfgoalLMH: int 2 2 2 3 2 3 1 2 2 2 ...
$ interestMS : int 2 2 1 2 2 2 2 2 2 2 ...
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid,
data=regdata)
FWDfit.p<-ols_step_forward_p(model,penter=.05)
FWDfit.p
Selection Summary
-------------------------------------------------------------------------
Variable Adj.
Step Entered R-Square R-Square C(p) AIC RMSE
-------------------------------------------------------------------------
1 mastery 0.3304 0.3255 16.5745 414.0549 1.0467
2 interest 0.3846 0.3756 6.2151 404.2283 1.0070
3 perfgoal 0.4044 0.3912 3.7171 401.6636 0.9944
-------------------------------------------------------------------------
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid,
data=regdata)
FWDfit.aic<-ols_step_forward_aic(model)
FWDfit.aic
Selection Summary
-----------------------------------------------------------------
Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
-----------------------------------------------------------------
mastery 414.055 74.585 151.176 0.33037 0.32552
interest 404.228 86.831 138.930 0.38462 0.37563
perfgoal 401.664 91.288 134.473 0.40436 0.39122
-----------------------------------------------------------------
plot(FWDfit.aic)
FWDfit.aic<-ols_step_forward_aic(model,details=TRUE)
Forward Selection Method
------------------------
Candidate Terms:
1 . mastery
2 . interest
3 . anxiety
4 . perfgoal
5 . genderid
Step 0: AIC = 468.1994
achieve ~ 1
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
mastery 1 414.055 74.585 151.176 0.330 0.326
interest 1 423.180 64.403 161.357 0.285 0.280
perfgoal 1 458.101 18.691 207.070 0.083 0.076
anxiety 1 462.003 12.838 212.923 0.057 0.050
genderid 1 468.566 2.618 223.143 0.012 0.004
---------------------------------------------------------------------
- mastery
Step 1 : AIC = 414.0549
achieve ~ mastery
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
interest 1 404.228 12.246 138.930 0.385 0.376
perfgoal 1 410.577 5.800 145.375 0.356 0.347
genderid 1 415.082 1.047 150.129 0.335 0.325
anxiety 1 415.583 0.509 150.667 0.333 0.323
---------------------------------------------------------------------
- interest
Step 2 : AIC = 404.2283
achieve ~ mastery + interest
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
perfgoal 1 401.664 4.457 134.473 0.404 0.391
genderid 1 405.265 0.953 137.977 0.389 0.375
anxiety 1 405.671 0.552 138.378 0.387 0.374
---------------------------------------------------------------------
- perfgoal
Step 3 : AIC = 401.6636
achieve ~ mastery + interest + perfgoal
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
genderid 1 402.058 1.533 132.939 0.411 0.394
anxiety 1 403.224 0.421 134.052 0.406 0.389
---------------------------------------------------------------------
No more variables to be added.
Variables Entered:
- mastery
- interest
- perfgoal
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.636 RMSE 0.994
R-Squared 0.404 Coef. Var 29.714
Adj. R-Squared 0.391 MSE 0.989
Pred R-Squared 0.355 MAE 0.786
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 91.288 3 30.429 30.775 0.0000
Residual 134.473 136 0.989
Total 225.761 139
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 2.002 0.290 6.900 0.000 1.428 2.576
mastery 0.340 0.077 0.373 4.437 0.000 0.188 0.491
interest 0.199 0.060 0.278 3.321 0.001 0.081 0.318
perfgoal -0.009 0.004 -0.145 -2.123 0.036 -0.018 -0.001
----------------------------------------------------------------------------------------
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid,
data=regdata)
BWDfit.p<-ols_step_backward_p(model,prem=.05)
BWDfit.p
Elimination Summary
------------------------------------------------------------------------
Variable Adj.
Step Removed R-Square R-Square C(p) AIC RMSE
------------------------------------------------------------------------
1 anxiety 0.4111 0.3937 4.1695 402.0579 0.9923
2 genderid 0.4044 0.3912 3.7171 401.6636 0.9944
------------------------------------------------------------------------
BWDfit.aic<-ols_step_backward_aic(model)
BWDfit.aic
Backward Elimination Summary
------------------------------------------------------------------
Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
------------------------------------------------------------------
Full Model 403.881 132.772 92.989 0.41189 0.38995
anxiety 402.058 132.939 92.821 0.41115 0.39370
genderid 401.664 134.473 91.288 0.40436 0.39122
------------------------------------------------------------------
plot(BWDfit.aic)
BWDfit.aic<-ols_step_backward_aic(model,details=TRUE)
Backward Elimination Method
---------------------------
Candidate Terms:
1 . mastery
2 . interest
3 . anxiety
4 . perfgoal
5 . genderid
Step 0: AIC = 403.881
achieve ~ mastery + interest + anxiety + perfgoal + genderid
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
anxiety 1 402.058 0.168 132.939 0.411 0.394
genderid 1 403.224 1.280 134.052 0.406 0.389
perfgoal 1 406.940 4.886 137.657 0.390 0.372
interest 1 412.775 10.745 143.516 0.364 0.345
mastery 1 418.293 16.514 149.285 0.339 0.319
---------------------------------------------------------------------
Variables Removed:
- anxiety
Step 1 : AIC = 402.0579
achieve ~ mastery + interest + perfgoal + genderid
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
genderid 1 401.664 1.533 134.473 0.404 0.391
perfgoal 1 405.265 5.037 137.977 0.389 0.375
interest 1 410.897 10.701 143.640 0.364 0.350
mastery 1 418.493 18.710 151.649 0.328 0.313
---------------------------------------------------------------------
- genderid
Step 2 : AIC = 401.6636
achieve ~ mastery + interest + perfgoal
---------------------------------------------------------------------
Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
---------------------------------------------------------------------
perfgoal 1 404.228 4.457 138.930 0.385 0.376
interest 1 410.577 10.902 145.375 0.356 0.347
mastery 1 418.590 19.466 153.939 0.318 0.308
---------------------------------------------------------------------
No more variables to be removed.
Variables Removed:
- anxiety
- genderid
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.636 RMSE 0.994
R-Squared 0.404 Coef. Var 29.714
Adj. R-Squared 0.391 MSE 0.989
Pred R-Squared 0.355 MAE 0.786
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 91.288 3 30.429 30.775 0.0000
Residual 134.473 136 0.989
Total 225.761 139
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 2.002 0.290 6.900 0.000 1.428 2.576
mastery 0.340 0.077 0.373 4.437 0.000 0.188 0.491
interest 0.199 0.060 0.278 3.321 0.001 0.081 0.318
perfgoal -0.009 0.004 -0.145 -2.123 0.036 -0.018 -0.001
----------------------------------------------------------------------------------------
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid,
data=regdata)
Bothfit.p<-ols_step_both_p(model,pent=.05,prem=.05)
Bothfit.p
Stepwise Selection Summary
-------------------------------------------------------------------------------------
Added/ Adj.
Step Variable Removed R-Square R-Square C(p) AIC RMSE
-------------------------------------------------------------------------------------
1 mastery addition 0.330 0.326 16.5750 414.0549 1.0467
2 interest addition 0.385 0.376 6.2150 404.2283 1.0070
3 perfgoal addition 0.404 0.391 3.7170 401.6636 0.9944
-------------------------------------------------------------------------------------
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid, data=regdata)
Bothfit.aic<-ols_step_both_aic(model)
Bothfit.aic
Stepwise Summary
----------------------------------------------------------------------------
Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
----------------------------------------------------------------------------
mastery addition 414.055 151.176 74.585 0.33037 0.32552
interest addition 404.228 138.930 86.831 0.38462 0.37563
perfgoal addition 401.664 134.473 91.288 0.40436 0.39122
----------------------------------------------------------------------------
plot(Bothfit.aic)
### All possible subsets regression
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid,
data=regdata)
modcompare<-ols_step_all_possible(model)
modcompare
Index N Predictors R-Square Adj. R-Square
1 1 1 mastery 0.33037183 0.32551945
2 2 1 interest 0.28527225 0.28009306
4 3 1 perfgoal 0.08279112 0.07614468
3 4 1 anxiety 0.05686591 0.05003161
5 5 1 genderid 0.01159687 0.00443453
6 6 2 mastery interest 0.38461546 0.37563174
8 7 2 mastery perfgoal 0.35606477 0.34666426
9 8 2 mastery genderid 0.33500869 0.32530079
7 9 2 mastery anxiety 0.33262424 0.32288153
11 10 2 interest perfgoal 0.31813350 0.30817925
10 11 2 interest anxiety 0.30354680 0.29337960
12 12 2 interest genderid 0.29162154 0.28128025
13 13 2 anxiety perfgoal 0.12526400 0.11249413
15 14 2 perfgoal genderid 0.10151651 0.08839996
14 15 2 anxiety genderid 0.06049667 0.04678130
17 16 3 mastery interest perfgoal 0.40435677 0.39121758
18 17 3 mastery interest genderid 0.38883726 0.37535573
16 18 3 mastery interest anxiety 0.38705972 0.37353898
21 19 3 mastery perfgoal genderid 0.36374981 0.34971487
19 20 3 mastery anxiety perfgoal 0.35770200 0.34353366
20 21 3 mastery anxiety genderid 0.33623356 0.32159166
22 22 3 interest anxiety perfgoal 0.33280150 0.31808389
24 23 3 interest perfgoal genderid 0.32827467 0.31345720
23 24 3 interest anxiety genderid 0.30646123 0.29116258
25 25 3 anxiety perfgoal genderid 0.13411781 0.11501747
28 26 4 mastery interest perfgoal genderid 0.41114901 0.39370157
26 27 4 mastery interest anxiety perfgoal 0.40622190 0.38862848
27 28 4 mastery interest anxiety genderid 0.39025253 0.37218594
29 29 4 mastery anxiety perfgoal genderid 0.36429984 0.34546428
30 30 4 interest anxiety perfgoal genderid 0.33874576 0.31915304
31 31 5 mastery interest anxiety perfgoal genderid 0.41189279 0.38994849
Mallow's Cp
1 16.574518
2 26.850439
4 72.985686
3 78.892735
5 89.207268
6 6.215129
8 12.720391
9 17.518010
7 18.061308
11 21.363016
10 24.686590
12 27.403756
13 65.308257
15 70.719113
14 80.065467
17 3.717079
18 7.253192
16 7.658203
21 12.969359
19 14.347349
20 19.238924
22 20.020918
24 21.052355
23 26.022538
25 65.290920
28 4.169470
26 5.292109
27 8.930723
29 14.844035
30 20.666521
31 6.000000
as.data.frame(modcompare)
mindex n predictors rsquare adjr
1 1 1 mastery 0.33037183 0.32551945
2 2 1 interest 0.28527225 0.28009306
4 3 1 perfgoal 0.08279112 0.07614468
3 4 1 anxiety 0.05686591 0.05003161
5 5 1 genderid 0.01159687 0.00443453
6 6 2 mastery interest 0.38461546 0.37563174
8 7 2 mastery perfgoal 0.35606477 0.34666426
9 8 2 mastery genderid 0.33500869 0.32530079
7 9 2 mastery anxiety 0.33262424 0.32288153
11 10 2 interest perfgoal 0.31813350 0.30817925
10 11 2 interest anxiety 0.30354680 0.29337960
12 12 2 interest genderid 0.29162154 0.28128025
13 13 2 anxiety perfgoal 0.12526400 0.11249413
15 14 2 perfgoal genderid 0.10151651 0.08839996
14 15 2 anxiety genderid 0.06049667 0.04678130
17 16 3 mastery interest perfgoal 0.40435677 0.39121758
18 17 3 mastery interest genderid 0.38883726 0.37535573
16 18 3 mastery interest anxiety 0.38705972 0.37353898
21 19 3 mastery perfgoal genderid 0.36374981 0.34971487
19 20 3 mastery anxiety perfgoal 0.35770200 0.34353366
20 21 3 mastery anxiety genderid 0.33623356 0.32159166
22 22 3 interest anxiety perfgoal 0.33280150 0.31808389
24 23 3 interest perfgoal genderid 0.32827467 0.31345720
23 24 3 interest anxiety genderid 0.30646123 0.29116258
25 25 3 anxiety perfgoal genderid 0.13411781 0.11501747
28 26 4 mastery interest perfgoal genderid 0.41114901 0.39370157
26 27 4 mastery interest anxiety perfgoal 0.40622190 0.38862848
27 28 4 mastery interest anxiety genderid 0.39025253 0.37218594
29 29 4 mastery anxiety perfgoal genderid 0.36429984 0.34546428
30 30 4 interest anxiety perfgoal genderid 0.33874576 0.31915304
31 31 5 mastery interest anxiety perfgoal genderid 0.41189279 0.38994849
predrsq cp aic sbic sbc msep fpe
1 0.30918074 16.574518 414.0549 16.408828 422.8798 153.3669 1.111126
2 0.26469681 26.850439 423.1799 25.276512 432.0049 163.6962 1.185961
4 0.04243592 72.985686 458.1006 59.259526 466.9256 210.0711 1.521941
3 0.02731876 78.892735 462.0029 63.063128 470.8278 216.0088 1.564959
5 -0.01698590 89.207268 468.5664 69.463903 477.3913 226.3769 1.640075
6 0.35215300 6.215129 404.2283 6.916331 415.9949 141.9797 1.035815
8 0.31654714 12.720391 410.5774 12.995632 422.3440 148.5668 1.083872
9 0.30255926 17.518010 415.0821 17.311607 426.8486 153.4248 1.119314
7 0.30059119 18.061308 415.5832 17.791873 427.3497 153.9749 1.123327
11 0.27907319 21.363016 418.5904 20.674793 430.3570 157.3182 1.147718
10 0.27096634 24.686590 421.5538 23.516715 433.3204 160.6836 1.172270
12 0.26108316 27.403756 423.9307 25.797072 435.6973 163.4350 1.192343
13 0.07295529 65.308257 453.4628 54.197491 465.2294 201.8165 1.472356
15 0.04819453 70.719113 457.2129 57.813378 468.9794 207.2954 1.512328
14 0.01508417 80.065467 463.4629 63.844712 475.2295 216.7594 1.581372
17 0.35455488 3.717079 401.6636 4.611247 416.3718 138.4430 1.017021
18 0.34601216 7.253192 405.2646 8.004617 419.9728 142.0501 1.043520
16 0.34386008 7.658203 405.6712 8.387930 420.3794 142.4633 1.046555
21 0.31368697 12.969359 410.8966 13.317196 425.6048 147.8811 1.086355
19 0.30828652 14.347349 412.2210 14.567508 426.9292 149.2868 1.096681
20 0.29169859 19.238924 416.8239 18.915616 431.5322 154.2766 1.133337
22 0.28218668 20.020918 417.5460 19.598079 432.2542 155.0743 1.139197
24 0.27976144 21.052355 418.4926 20.493058 433.2008 156.1265 1.146927
23 0.26283806 26.022538 422.9667 24.725494 437.6749 161.1965 1.184172
25 0.06684641 65.290920 454.0385 54.243999 468.7468 201.2536 1.478437
28 0.35117288 4.169470 402.0579 5.185944 419.7078 137.8857 1.019906
26 0.34633923 5.292109 403.2245 6.267266 420.8744 139.0394 1.028439
27 0.33572745 8.930723 406.9400 9.714043 424.5898 142.7788 1.056099
29 0.30315879 14.844035 412.7755 15.136027 430.4253 148.8559 1.101050
30 0.27748785 20.666521 418.2931 20.272308 435.9429 154.8397 1.145310
31 0.34093391 6.000000 403.8810 7.111514 424.4725 138.7469 1.033296
apc hsp
1 0.6890377 0.007996178
2 0.7354445 0.008534722
4 0.9437946 0.010952593
3 0.9704713 0.011262172
5 1.0170525 0.011802740
6 0.6423357 0.007456508
8 0.6721368 0.007802452
9 0.6941150 0.008057584
7 0.6966039 0.008086476
11 0.7117293 0.008262058
10 0.7269548 0.008438803
12 0.7394023 0.008583299
13 0.9130456 0.010599024
15 0.9378331 0.010886769
14 0.9806495 0.011383799
17 0.6306811 0.007324229
18 0.6471135 0.007515062
16 0.6489956 0.007536919
21 0.6736767 0.007823546
19 0.6800802 0.007897911
20 0.7028115 0.008161895
22 0.7064455 0.008204096
24 0.7112386 0.008259760
23 0.7343352 0.008527985
25 0.9168164 0.010647178
28 0.6324696 0.007348779
26 0.6377617 0.007410269
27 0.6549139 0.007609565
29 0.6827891 0.007933451
30 0.7102360 0.008252362
31 0.6407735 0.007449866
plot(modcompare)
model<-lm(achieve~mastery+interest+anxiety+perfgoal+genderid, data=regdata)
modcompare<-ols_step_best_subset(model)
modcompare
Best Subsets Regression
---------------------------------------------------------
Model Index Predictors
---------------------------------------------------------
1 mastery
2 mastery interest
3 mastery interest perfgoal
4 mastery interest perfgoal genderid
5 mastery interest anxiety perfgoal genderid
---------------------------------------------------------
Subsets Regression Summary
---------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
---------------------------------------------------------------------------------------------------------------------------------
1 0.3304 0.3255 0.3092 16.5745 414.0549 16.4088 422.8798 153.3669 1.1111 0.0080 0.6890
2 0.3846 0.3756 0.3522 6.2151 404.2283 6.9163 415.9949 141.9797 1.0358 0.0075 0.6423
3 0.4044 0.3912 0.3546 3.7171 401.6636 4.6112 416.3718 138.4430 1.0170 0.0073 0.6307
4 0.4111 0.3937 0.3512 4.1695 402.0579 5.1859 419.7078 137.8857 1.0199 0.0073 0.6325
5 0.4119 0.3899 0.3409 6.0000 403.8810 7.1115 424.4725 138.7469 1.0333 0.0074 0.6408
---------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria
plot(modcompare)
Information on the ‘olsrr’ package and variable selection procedures
https://cran.r-project.org/web/packages/olsrr/olsrr.pdf https://cran.r-project.org/web/packages/olsrr/vignettes/variable_selection.html https://olsrr.rsquaredacademy.com/articles/variable_selection.html