Correlation Plot

(for non-binary variables)

There don’t appear to be major red flags in the correlations. Some variables are strongly related with each other, but nothing that would lead to issues.




Random Forest Training Output

##                          Sample size: 881
##                      Number of trees: 1000
##            Forest terminal node size: 5
##        Average no. of terminal nodes: 117.814
## No. of variables tried at each split: 15
##               Total no. of variables: 43
##        Resampling used to grow trees: swor
##     Resample size used to grow trees: 557
##                             Analysis: RF-R
##                               Family: regr
##                       Splitting rule: mse *random*
##        Number of random split points: 10
##                      (OOB) R squared: 0.22781441
##    (OOB) Requested performance error: 1.55686503

From the training data, the model had a good OOB R-squared.




Random Forest Testing Data Output

##   Sample size of test (predict) data: 369
##                 Number of grow trees: 1000
##   Average no. of grow terminal nodes: 117.814
##          Total no. of grow variables: 43
##        Resampling used to grow trees: swor
##     Resample size used to grow trees: 557
##                             Analysis: RF-R
##                               Family: regr
##                            R squared: 0.19801046
##          Requested performance error: 1.78474045

The test data showed a similarly good OOB R-squared.




VIMP Graph

As predicted, 2018 life satisfaction is by far the most important predictor of 2022 life satisfaction. Interestingly, there are no demographic predictors that have real importance for predicting life satisfaction.