Project Overview

These data was extracted from Internal Revenue Service Form 990, which some tax-exempt organizations are required to submit as part of their annual reporting. In the Mohawk Valley, there are 328 tax-exempt organizations with annual revenues of more than $200,000, therefore, they must file a 990. These data offer a snapshot of the 100 highest-grossing nonprofits between Oneida and Herkimer counties in upstate New York. We will explore these data throughout the term and eventually use the full data set. Although there is longitudinal data available, for this deliverable we will focus on the last full reporting year of 2018.

Even though IRS Form 990 allows for considerable high-dimensionality with 32 features, we have elected to use four variables as they offer the most complete data with limited missing values. Similar to for-profit companies, much can be gleaned from four major fiscal reporting categories – revenue, expenses, assets, and liabilities – to measure the overall health of a tax-exempt organization.

For modeling, we will use the least squares estimator (LSE) – minimizing the sum of squares of the errors – followed by ridge and lasso regression supervised models. The deliverable closes with a holistic comparison of the three. These linear models are considered to be the “go-to as the first algorithm to try, good for very large datasets, and good for very high-dimensionality data” (Müller & Guido, 2016). Ultimately, supervised learning is used whenever we want to predict a certain outcome from a given input with a goal of making accurate predictions for new, never-before-seen data (Müller & Guido, 2016).

Load Data & Library

df <- read.csv("C:/Users/bjorzech/Desktop/DSC609_W2_NP.csv",stringsAsFactors = FALSE)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2

Preprocessing

Some preprocessing steps are needed in order to run the models correctly. The name of the organization is not used in this specific modeling and revenue is transformed to numeric for consistency as it is loaded as an integer. For flow of the deliverable, we print out the head to show the first six records in the data set of 100 tax-exempt organizations.

df$revenue <- as.numeric (df$revenue)
df$expenses <- as.numeric (df$expenses)
df$liabilities <- as.numeric (df$liabilities)
df$assets <- as.numeric (df$assets)
str(df)
## 'data.frame':    100 obs. of  5 variables:
##  $ organization: chr  "Faxton-St Lukes Healthcare" "St Elizabeth Medical Center" "Trustees of Hamilton College" "Utica College" ...
##  $ revenue     : num  3.03e+08 2.22e+08 2.00e+08 9.98e+07 9.34e+07 ...
##  $ expenses    : num  3.04e+08 2.23e+08 1.82e+08 9.39e+07 9.13e+07 ...
##  $ liabilities : num  1.08e+08 1.13e+08 2.76e+08 6.91e+07 2.05e+07 ...
##  $ assets      : num  2.61e+08 1.17e+08 1.39e+09 1.17e+08 4.89e+07 ...
head(df)
##                   organization   revenue  expenses liabilities     assets
## 1   Faxton-St Lukes Healthcare 302766033 303562622   108477839  260787709
## 2  St Elizabeth Medical Center 221958072 222966808   113017092  117179581
## 3 Trustees of Hamilton College 199948992 182057650   276031568 1393517194
## 4                Utica College  99799776  93885974    69080233  117018066
## 5   Upstate Cerebral Palsy Inc  93374322  91328400    20508445   48870808
## 6   Rome Memorial Hospital Inc  85724662  88871216    27876413   55501103

Normalize

Additionally, the preprocessing step of normalizing the data is essential. If different variables are to be used, then such a transformation is often necessary to avoid having a variable with large values dominate the results of the analysis (Tan et al., 2019, p. 71). This included variables revenue, expenses, assets, and liabilities.

preproc1 <- preProcess(df[,c(2:5)], method=c("center", "scale"))
norm1 <- predict(preproc1, df[,c(2:5)])
summary(norm1)
##     revenue            expenses        liabilities          assets       
##  Min.   :-0.39552   Min.   :-0.3965   Min.   :-0.1675   Min.   :-0.2257  
##  1st Qu.:-0.38001   1st Qu.:-0.3700   1st Qu.:-0.1659   1st Qu.:-0.2153  
##  Median :-0.35221   Median :-0.3345   Median :-0.1595   Median :-0.2012  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.:-0.08727   3rd Qu.:-0.1178   3rd Qu.:-0.1147   3rd Qu.:-0.1516  
##  Max.   : 6.30925   Max.   : 6.4626   Max.   : 9.6245   Max.   : 7.0674

Fitting LSE Regression

The following LSE model was created in order to understand the relationship between the variables. Linear models can be characterized as regression models for which the prediction is a line for a single feature, a plane when using two features, or a hyperplane in higher dimensions (Müller & Guido, 2016). Additionally, with linear models, there’s a higher chance of overfitting.

lse <- lm(revenue ~ ., data=norm1)
summary(lse)
## 
## Call:
## lm(formula = revenue ~ ., data = norm1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09338 -0.01416 -0.00831 -0.00279  0.73190 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.350e-16  8.095e-03   0.000 1.000000    
## expenses     9.722e-01  9.385e-03 103.592  < 2e-16 ***
## liabilities -2.613e-02  1.564e-02  -1.670 0.098088 .  
## assets       6.805e-02  1.688e-02   4.031 0.000111 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08095 on 96 degrees of freedom
## Multiple R-squared:  0.9936, Adjusted R-squared:  0.9934 
## F-statistic:  5004 on 3 and 96 DF,  p-value: < 2.2e-16

The R-squared is assessed in order to find how well the regression model fits the observed data while explaining the variation among the remaining independent variables. In this case, the variables show statistical significance save for liabilities. A multiple r-squared of 0.9936 and an adjusted r-squared of 0.9934 shows that a strong model fit exists.

We also plotted the findings below, simply to show the relationship between these four variables. Plots are used sparingly in this deliverable as they connect to the final outcome in a clear, meaningful way, especially with lasso regression.

library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.3.0 --
## v tibble  3.0.1     v dplyr   0.8.5
## v tidyr   1.0.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.4
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x purrr::lift()   masks caret::lift()
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
pairs(~ revenue + expenses + assets + liabilities, data = norm1, 
      lower.panel = panel.smooth, main = 'Relationship Between Key Fiscal MV Nonprofit Variables - 2018')

Cross-Validate LSE Regression

A cross-validation is performed and changes the strength of fit. Additionally, a tuning parameter is included to measure the amount of shrinkage. The values are shrunk toward a central point and in this case, it’s revenue. Tuning controls the strength of the penalty. This will follow in our ridge and lasso regression models.

kfold <- trainControl(method = "cv", number = 10)
lse <- train (revenue ~.,
              data = norm1,
              trControl = kfold,
              method = "lm")
lse
## Linear Regression 
## 
## 100 samples
##   3 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 88, 91, 90, 90, 89, 91, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE       
##   0.1471697  0.9532927  0.05809335
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Ridge Regression

With ridge regression, the coefficients are chosen not only so they predict well on the training data, but also to fit an additional constraint (Müller & Guido, 2016). We also want the magnitude of the coefficient to be as small as possible, therefore, all entries of the coefficient should be close to 0. Each feature should have as little effect on the outcome as possible while also predicting well (Müller & Guido, 2016).

Additionally, the tuning parameter lambda is chosen by cross-validation. When lambda is small, the result is essentially the least squares estimate. As lambda increases, shrinkage occurs so that variables that are at zero can be disregarded. In ridge regression, the alpha is also 0. Decreasing alpha allows the coefficients to be less restricted.

lambda <- 10^seq(10, -2, length = 100)
ridge <- train(revenue ~ .,
               data = norm1,
               trControl = kfold,
               method = "glmnet",
               tuneGrid = expand.grid(alpha = 0, lambda = lambda))
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

The results are then printed.

ridge$results
##     alpha       lambda      RMSE  Rsquared       MAE    RMSESD RsquaredSD
## 1       0 1.000000e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 2       0 1.321941e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 3       0 1.747528e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 4       0 2.310130e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 5       0 3.053856e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 6       0 4.037017e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 7       0 5.336699e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 8       0 7.054802e-02 0.3613550 0.9295957 0.1524341 0.8065221  0.1992586
## 9       0 9.326033e-02 0.3666040 0.9295914 0.1550639 0.8054378  0.1992569
## 10      0 1.232847e-01 0.3903458 0.9295673 0.1673926 0.8263672  0.1990657
## 11      0 1.629751e-01 0.4239037 0.9295595 0.1855841 0.8565065  0.1986461
## 12      0 2.154435e-01 0.4606231 0.9295727 0.2064178 0.8814519  0.1980767
## 13      0 2.848036e-01 0.5000989 0.9296017 0.2300091 0.8998223  0.1973321
## 14      0 3.764936e-01 0.5416043 0.9296407 0.2561987 0.9103199  0.1963787
## 15      0 4.977024e-01 0.5834348 0.9296772 0.2839235 0.9099325  0.1952102
## 16      0 6.579332e-01 0.6238334 0.9296964 0.3123336 0.8969442  0.1938302
## 17      0 8.697490e-01 0.6608256 0.9296822 0.3403882 0.8703305  0.1922650
## 18      0 1.149757e+00 0.6925419 0.9296233 0.3670021 0.8304313  0.1905599
## 19      0 1.519911e+00 0.7177330 0.9295136 0.3912661 0.7801116  0.1887809
## 20      0 2.009233e+00 0.7358387 0.9293560 0.4126738 0.7244693  0.1870033
## 21      0 2.656088e+00 0.7471105 0.9291607 0.4310091 0.6705340  0.1852999
## 22      0 3.511192e+00 0.7524667 0.9289431 0.4460741 0.6258382  0.1837307
## 23      0 4.641589e+00 0.7532591 0.9287197 0.4580472 0.5962824  0.1823356
## 24      0 6.135907e+00 0.7509994 0.9285047 0.4673099 0.5840211  0.1811334
## 25      0 8.111308e+00 0.7471367 0.9283081 0.4743318 0.5868819  0.1801245
## 26      0 1.072267e+01 0.7429514 0.9281358 0.4795773 0.5998662  0.1792964
## 27      0 1.417474e+01 0.7395973 0.9279896 0.4834594 0.6173565  0.1786288
## 28      0 1.873817e+01 0.7382789 0.9278689 0.4863187 0.6344414  0.1780981
## 29      0 2.477076e+01 0.7402739 0.9277710 0.4884219 0.6473512  0.1776811
## 30      0 3.274549e+01 0.7459023 0.9276931 0.4906406 0.6545115  0.1773563
## 31      0 4.328761e+01 0.7534892 0.9276317 0.4976583 0.6574214  0.1771050
## 32      0 5.722368e+01 0.7610440 0.9275839 0.5031059 0.6583671  0.1769117
## 33      0 7.564633e+01 0.7676167 0.9275468 0.5073093 0.6586173  0.1767635
## 34      0 1.000000e+02 0.7729962 0.9275183 0.5105376 0.6586581  0.1766503
## 35      0 1.321941e+02 0.7772678 0.9274964 0.5130082 0.6586499  0.1765640
## 36      0 1.747528e+02 0.7806036 0.9274797 0.5148937 0.6586388  0.1764984
## 37      0 2.310130e+02 0.7831826 0.9274670 0.5163295 0.6586344  0.1764485
## 38      0 3.053856e+02 0.7851638 0.9274573 0.5174213 0.6586362  0.1764107
## 39      0 4.037017e+02 0.7866791 0.9274499 0.5182503 0.6586415  0.1763820
## 40      0 5.336699e+02 0.7878347 0.9274443 0.5188793 0.6586483  0.1763603
## 41      0 7.054802e+02 0.7887140 0.9274401 0.5193561 0.6586552  0.1763438
## 42      0 9.326033e+02 0.7898122 0.9147865 0.5199455 0.6595452  0.1973322
## 43      0 1.232847e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 44      0 1.629751e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 45      0 2.154435e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 46      0 2.848036e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 47      0 3.764936e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 48      0 4.977024e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 49      0 6.579332e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 50      0 8.697490e+03 0.7914733       NaN 0.5208432 0.6586878         NA
## 51      0 1.149757e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 52      0 1.519911e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 53      0 2.009233e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 54      0 2.656088e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 55      0 3.511192e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 56      0 4.641589e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 57      0 6.135907e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 58      0 8.111308e+04 0.7914733       NaN 0.5208432 0.6586878         NA
## 59      0 1.072267e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 60      0 1.417474e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 61      0 1.873817e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 62      0 2.477076e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 63      0 3.274549e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 64      0 4.328761e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 65      0 5.722368e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 66      0 7.564633e+05 0.7914733       NaN 0.5208432 0.6586878         NA
## 67      0 1.000000e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 68      0 1.321941e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 69      0 1.747528e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 70      0 2.310130e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 71      0 3.053856e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 72      0 4.037017e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 73      0 5.336699e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 74      0 7.054802e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 75      0 9.326033e+06 0.7914733       NaN 0.5208432 0.6586878         NA
## 76      0 1.232847e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 77      0 1.629751e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 78      0 2.154435e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 79      0 2.848036e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 80      0 3.764936e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 81      0 4.977024e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 82      0 6.579332e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 83      0 8.697490e+07 0.7914733       NaN 0.5208432 0.6586878         NA
## 84      0 1.149757e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 85      0 1.519911e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 86      0 2.009233e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 87      0 2.656088e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 88      0 3.511192e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 89      0 4.641589e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 90      0 6.135907e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 91      0 8.111308e+08 0.7914733       NaN 0.5208432 0.6586878         NA
## 92      0 1.072267e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 93      0 1.417474e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 94      0 1.873817e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 95      0 2.477076e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 96      0 3.274549e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 97      0 4.328761e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 98      0 5.722368e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 99      0 7.564633e+09 0.7914733       NaN 0.5208432 0.6586878         NA
## 100     0 1.000000e+10 0.7914733       NaN 0.5208432 0.6586878         NA
##         MAESD
## 1   0.2751543
## 2   0.2751543
## 3   0.2751543
## 4   0.2751543
## 5   0.2751543
## 6   0.2751543
## 7   0.2751543
## 8   0.2751543
## 9   0.2747915
## 10  0.2818401
## 11  0.2918999
## 12  0.3001985
## 13  0.3069780
## 14  0.3122734
## 15  0.3142675
## 16  0.3124791
## 17  0.3067143
## 18  0.2972846
## 19  0.2853558
## 20  0.2729424
## 21  0.2621219
## 22  0.2546186
## 23  0.2515956
## 24  0.2530395
## 25  0.2579722
## 26  0.2650061
## 27  0.2728477
## 28  0.2805532
## 29  0.2875615
## 30  0.2929351
## 31  0.2919988
## 32  0.2914119
## 33  0.2910463
## 34  0.2908186
## 35  0.2906762
## 36  0.2905865
## 37  0.2905292
## 38  0.2904922
## 39  0.2904679
## 40  0.2904516
## 41  0.2904405
## 42  0.2909043
## 43  0.2904129
## 44  0.2904129
## 45  0.2904129
## 46  0.2904129
## 47  0.2904129
## 48  0.2904129
## 49  0.2904129
## 50  0.2904129
## 51  0.2904129
## 52  0.2904129
## 53  0.2904129
## 54  0.2904129
## 55  0.2904129
## 56  0.2904129
## 57  0.2904129
## 58  0.2904129
## 59  0.2904129
## 60  0.2904129
## 61  0.2904129
## 62  0.2904129
## 63  0.2904129
## 64  0.2904129
## 65  0.2904129
## 66  0.2904129
## 67  0.2904129
## 68  0.2904129
## 69  0.2904129
## 70  0.2904129
## 71  0.2904129
## 72  0.2904129
## 73  0.2904129
## 74  0.2904129
## 75  0.2904129
## 76  0.2904129
## 77  0.2904129
## 78  0.2904129
## 79  0.2904129
## 80  0.2904129
## 81  0.2904129
## 82  0.2904129
## 83  0.2904129
## 84  0.2904129
## 85  0.2904129
## 86  0.2904129
## 87  0.2904129
## 88  0.2904129
## 89  0.2904129
## 90  0.2904129
## 91  0.2904129
## 92  0.2904129
## 93  0.2904129
## 94  0.2904129
## 95  0.2904129
## 96  0.2904129
## 97  0.2904129
## 98  0.2904129
## 99  0.2904129
## 100 0.2904129
ridge$bestTune$lambda
## [1] 0.07054802
coef(ridge$finalModel, ridge$bestTune$lambda)
## 4 x 1 sparse Matrix of class "dgCMatrix"
##                         1
## (Intercept)  5.520228e-17
## expenses     8.689033e-01
## liabilities -2.549446e-02
## assets       1.042747e-01

Since ridge is a more restricted model, we are less likely to overfit. The training set for ridge is higher than the LSE after cross-validation along with the Root Mean Square Error (RMSE) and the r-squared. A less complex model means worse performance in the training set, but stronger generalization, which is regularization (Müller & Guido, 2016).

Lasso

As with ridge, lasso regression also restricts coefficients to be close to zero, but with regularization. When regularization, some of the coefficients are exactly zero. This means some features are entirely ignored by the model (Müller & Guido, 2016).

We run the following model and change the alpha to 1.

lambda <- 10^seq(10, -2, length = 100)
lasso <- train(revenue ~ .,
               data = norm1,
               trControl = kfold,
               method = "glmnet",
               tuneGrid = expand.grid(alpha = 1, lambda = lambda))
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

Then we print the results.

lasso$results
##     alpha       lambda      RMSE  Rsquared        MAE     RMSESD  RsquaredSD
## 1       1 1.000000e-02 0.1223510 0.9870230 0.05215131 0.19773229 0.020506909
## 2       1 1.321941e-02 0.1216803 0.9871751 0.05251851 0.19082354 0.020014069
## 3       1 1.747528e-02 0.1210370 0.9873694 0.05305019 0.18219838 0.019383484
## 4       1 2.310130e-02 0.1202193 0.9876506 0.05417750 0.17063010 0.018479817
## 5       1 3.053856e-02 0.1195040 0.9880307 0.05609905 0.15622031 0.017268451
## 6       1 4.037017e-02 0.1173026 0.9885894 0.05815561 0.13737833 0.015546427
## 7       1 5.336699e-02 0.1147471 0.9893880 0.06083259 0.11467503 0.013197438
## 8       1 7.054802e-02 0.1123525 0.9906039 0.06456895 0.09117263 0.010084490
## 9       1 9.326033e-02 0.1125317 0.9924059 0.07158897 0.08459811 0.007355101
## 10      1 1.232847e-01 0.1322745 0.9927694 0.08636212 0.10142122 0.007236515
## 11      1 1.629751e-01 0.1643882 0.9927077 0.10738597 0.12703656 0.007243937
## 12      1 2.154435e-01 0.2083751 0.9927077 0.13524533 0.16471004 0.007243937
## 13      1 2.848036e-01 0.2680967 0.9927077 0.17210997 0.21756217 0.007243937
## 14      1 3.764936e-01 0.3482924 0.9927077 0.22153992 0.28981851 0.007243937
## 15      1 4.977024e-01 0.4552720 0.9927077 0.28718466 0.38714792 0.007243937
## 16      1 6.579332e-01 0.5974321 0.9927077 0.37434916 0.51716284 0.007243937
## 17      1 8.697490e-01 0.7634657 0.9933457 0.48013618 0.63662180 0.007379350
## 18      1 1.149757e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 19      1 1.519911e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 20      1 2.009233e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 21      1 2.656088e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 22      1 3.511192e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 23      1 4.641589e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 24      1 6.135907e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 25      1 8.111308e+00 0.8294742       NaN 0.53058569 0.62862168          NA
## 26      1 1.072267e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 27      1 1.417474e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 28      1 1.873817e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 29      1 2.477076e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 30      1 3.274549e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 31      1 4.328761e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 32      1 5.722368e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 33      1 7.564633e+01 0.8294742       NaN 0.53058569 0.62862168          NA
## 34      1 1.000000e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 35      1 1.321941e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 36      1 1.747528e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 37      1 2.310130e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 38      1 3.053856e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 39      1 4.037017e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 40      1 5.336699e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 41      1 7.054802e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 42      1 9.326033e+02 0.8294742       NaN 0.53058569 0.62862168          NA
## 43      1 1.232847e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 44      1 1.629751e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 45      1 2.154435e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 46      1 2.848036e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 47      1 3.764936e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 48      1 4.977024e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 49      1 6.579332e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 50      1 8.697490e+03 0.8294742       NaN 0.53058569 0.62862168          NA
## 51      1 1.149757e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 52      1 1.519911e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 53      1 2.009233e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 54      1 2.656088e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 55      1 3.511192e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 56      1 4.641589e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 57      1 6.135907e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 58      1 8.111308e+04 0.8294742       NaN 0.53058569 0.62862168          NA
## 59      1 1.072267e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 60      1 1.417474e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 61      1 1.873817e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 62      1 2.477076e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 63      1 3.274549e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 64      1 4.328761e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 65      1 5.722368e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 66      1 7.564633e+05 0.8294742       NaN 0.53058569 0.62862168          NA
## 67      1 1.000000e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 68      1 1.321941e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 69      1 1.747528e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 70      1 2.310130e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 71      1 3.053856e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 72      1 4.037017e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 73      1 5.336699e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 74      1 7.054802e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 75      1 9.326033e+06 0.8294742       NaN 0.53058569 0.62862168          NA
## 76      1 1.232847e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 77      1 1.629751e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 78      1 2.154435e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 79      1 2.848036e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 80      1 3.764936e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 81      1 4.977024e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 82      1 6.579332e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 83      1 8.697490e+07 0.8294742       NaN 0.53058569 0.62862168          NA
## 84      1 1.149757e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 85      1 1.519911e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 86      1 2.009233e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 87      1 2.656088e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 88      1 3.511192e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 89      1 4.641589e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 90      1 6.135907e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 91      1 8.111308e+08 0.8294742       NaN 0.53058569 0.62862168          NA
## 92      1 1.072267e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 93      1 1.417474e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 94      1 1.873817e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 95      1 2.477076e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 96      1 3.274549e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 97      1 4.328761e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 98      1 5.722368e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 99      1 7.564633e+09 0.8294742       NaN 0.53058569 0.62862168          NA
## 100     1 1.000000e+10 0.8294742       NaN 0.53058569 0.62862168          NA
##          MAESD
## 1   0.05945258
## 2   0.05709423
## 3   0.05414339
## 4   0.05057412
## 5   0.04658627
## 6   0.04177232
## 7   0.03657459
## 8   0.03284276
## 9   0.03725870
## 10  0.04749383
## 11  0.06053305
## 12  0.07798126
## 13  0.10127557
## 14  0.13184422
## 15  0.17210579
## 16  0.22507065
## 17  0.27227083
## 18  0.26044216
## 19  0.26044216
## 20  0.26044216
## 21  0.26044216
## 22  0.26044216
## 23  0.26044216
## 24  0.26044216
## 25  0.26044216
## 26  0.26044216
## 27  0.26044216
## 28  0.26044216
## 29  0.26044216
## 30  0.26044216
## 31  0.26044216
## 32  0.26044216
## 33  0.26044216
## 34  0.26044216
## 35  0.26044216
## 36  0.26044216
## 37  0.26044216
## 38  0.26044216
## 39  0.26044216
## 40  0.26044216
## 41  0.26044216
## 42  0.26044216
## 43  0.26044216
## 44  0.26044216
## 45  0.26044216
## 46  0.26044216
## 47  0.26044216
## 48  0.26044216
## 49  0.26044216
## 50  0.26044216
## 51  0.26044216
## 52  0.26044216
## 53  0.26044216
## 54  0.26044216
## 55  0.26044216
## 56  0.26044216
## 57  0.26044216
## 58  0.26044216
## 59  0.26044216
## 60  0.26044216
## 61  0.26044216
## 62  0.26044216
## 63  0.26044216
## 64  0.26044216
## 65  0.26044216
## 66  0.26044216
## 67  0.26044216
## 68  0.26044216
## 69  0.26044216
## 70  0.26044216
## 71  0.26044216
## 72  0.26044216
## 73  0.26044216
## 74  0.26044216
## 75  0.26044216
## 76  0.26044216
## 77  0.26044216
## 78  0.26044216
## 79  0.26044216
## 80  0.26044216
## 81  0.26044216
## 82  0.26044216
## 83  0.26044216
## 84  0.26044216
## 85  0.26044216
## 86  0.26044216
## 87  0.26044216
## 88  0.26044216
## 89  0.26044216
## 90  0.26044216
## 91  0.26044216
## 92  0.26044216
## 93  0.26044216
## 94  0.26044216
## 95  0.26044216
## 96  0.26044216
## 97  0.26044216
## 98  0.26044216
## 99  0.26044216
## 100 0.26044216
lasso$bestTune$lambda
## [1] 0.07054802
coef(lasso$finalModel, lasso$bestTune$lambda)
## 4 x 1 sparse Matrix of class "dgCMatrix"
##                        1
## (Intercept) 6.055488e-17
## expenses    9.250404e-01
## liabilities .           
## assets      .

A model is often easier to interpret and can reveal the most important features when some coefficients are exactly zero (Müller & Guido, 2016). This makes the model easier to understand. The prediction is about the same on the training set but the RMSE and r-squared are not as strong.

Summary

Since we are interested in general, we choose the ridge regression model over LSE and lasso. According to Müller and Guido (2016), in practice, ridge regression is usually the preferred choice between ridge and lasso and is often expected to outperform the LSE as we also can recognize fewer features with some importance.

If the ridge regression performs poorly, it’s an underfitted model. In this case, it performs well. This helps with future modeling as we plan to use this set more in the coming weeks.

References

Müller, A.C., & Guido, S. (2016). Introduction to machine learning with python. Sebastopol, CA: O’Reilly Media, Inc.

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to data mining. New York, NY: Pearson Education, Inc.