Abstract
Did you know 401(k) plans were created almost by accident? The plans started when Congress passed the Revenue Act of 1978, which included a provision that was added to the Internal Revenue Code that allowed employees to avoid being taxed on deferred compensation (Stobierski, 2018). This study attempts to explain what affects employee participation rate in the employer sponsored 401k plan. I found that a polynomial regression was good model that used match rate as the main predictor. All of the coefficients were statistically significant to the participation rate. However, this model was limited based on the original data only having seven different independent variables that were not the best predictors which hurt the practical significance. In addition, the models were based off a small sample of 1534 companies compared to how many companies there are in the United States.I am trying to estimate what impacts the percentage of employees who participate in a company’s 401k plan.
Using a simple linear regression, I am choosing to try and predict the percentage of employees who participate in the 401k plan based on the rate at which the employer matches employee contributions to the plan. I choose this form because logically employees are more likely to participate in a plan where the employer contributes more.
\(prate_{i} = \beta_{0} + \beta_{1}mrate_{i} + \epsilon_{i}\)
My second regression is a combination of the first with an addition of the log of total employees eligible. I am trying to predict the percentage of employees who participate in the 401k plan based on the rate at which the employer matches employee contributions to the plan and the log of the total number of employees eligible to participate in the plan. The log of the total employees eligible will help make the distribution more normal and reduce effect of outliers.
\(prate_{i} = \beta_{0} + \beta_{1}mrate_{i} + \beta_{2}ltotelg_{i} +\epsilon_{i}\)
My third model takes everything from the second model and adds a squared match rate coefficient to the model. I am adding in the squared term because I know that the participation rate cannot go over 100 percent thus after a certain rate the employer contributions will no longer increase the participation rate because the maximum will already be reached.
\(prate_{i} = \beta_{0} + \beta_{1}mrate_{i} + \beta_{2}ltotelg_{i} + \beta_{3}mrate^2_i +\epsilon_{i}\)
The data is cross-sectional data where the unit of analysis is a retirement plan (401k). The data is limited to only a few different measures about the plan itself. Thus, I must assume increases in match rate is the most important causal relationship of participation which is a large assumption and leads to bias in my regressions. I would have liked to more data on things like the industry of the company, average annual salaries, average level of education and other descriptive statistics of those that participated in the plans.
The dependent variable is prate which is the percentage of employees who participate in the plan. The first independent variable is mrate. Mrate is the rate at which the employer matches employee contributions to the plan. For example, a match rate of 0.21 would imply the employer contributes $0.21 for every dollar an employee contributes where as a match rate of 1.42 would imply the employer contributes $1.42 for every dollar the employee contributes. The other independent variable is ltotelg. This is the log of the total number of employees eligible to participate in the plan. Lastly, I want to check to see if the age of the plan being offered effects the model.
The following table details some of the descriptive statistics of key variables. The total employees eligible is very skewed which is why I chose to take the log of it for my model.
| n | mean | sd | median | min | max | range | |
|---|---|---|---|---|---|---|---|
| prate | 1534 | 87.3629075 | 16.7165374 | 95.699997 | 3.000000 | 100.00000 | 97.000000 |
| mrate | 1534 | 0.7315124 | 0.7795393 | 0.460000 | 0.010000 | 4.91000 | 4.900000 |
| totelg | 1534 | 1628.5345502 | 5370.7193562 | 330.000000 | 51.000000 | 70429.00000 | 70378.000000 |
| ltotelg | 1534 | 6.1353126 | 1.2899017 | 5.799093 | 3.931826 | 11.16236 | 7.230535 |
Below are the regressions mentioned above. The significant coefficients look intriguing but are biased and lack practical significance. Interpretation of each model is listed after the tables.
| Dependent variable: | |
| prate | |
| mrate | 5.861*** |
| (0.527) | |
| Constant | 83.075*** |
| (0.563) | |
| Observations | 1,534 |
| R2 | 0.075 |
| Adjusted R2 | 0.074 |
| Residual Std. Error | 16.085 (df = 1532) |
| F Statistic | 123.685*** (df = 1; 1532) |
| Note: | p<0.1; p<0.05; p<0.01 |
| Dependent variable: | ||
| prate | ||
| (1) | (2) | |
| mrate | 5.413*** | 15.760*** |
| (0.516) | (1.426) | |
| I(mrate2) | -2.961*** | |
| (0.382) | ||
| ltotelg | -2.857*** | -2.684*** |
| (0.312) | (0.307) | |
| Constant | 100.934*** | 95.682*** |
| (2.023) | (2.097) | |
| Observations | 1,534 | 1,534 |
| R2 | 0.123 | 0.156 |
| Adjusted R2 | 0.122 | 0.154 |
| Residual Std. Error | 15.666 (df = 1531) | 15.372 (df = 1530) |
| F Statistic | 107.241*** (df = 2; 1531) | 94.328*** (df = 3; 1530) |
| Note: | p<0.1; p<0.05; p<0.01 | |
One variable I wanted to check to make sure I was not omitting was the age of the plan. I tried this check by using the age from four to ten year plans as a slope dummy variable. For example, my estimated coefficient for my age5 variable was 1.380. This implies plans that are 5 years old have participation rates that are 1.38 percentage points higher than four-year-old plans (the omitted category). The pattern in the coefficients implies the relationship between age and participation is generally positive, meaning participation is generally higher in plans that have been around longer. However, all of these estimates appear to be imprecisely measured. None of the ages are statistically significantly different from 0.
| Dependent variable: | |
| prate | |
| mrate | 6.975*** |
| (0.820) | |
| age5 | 0.792 |
| (10.311) | |
| age6 | 2.413 |
| (10.303) | |
| age7 | 5.057 |
| (10.258) | |
| age8 | 3.240 |
| (10.293) | |
| age9 | 5.783 |
| (10.374) | |
| age10 | 7.951 |
| (10.504) | |
| Constant | 76.436*** |
| (10.202) | |
| Observations | 898 |
| R2 | 0.085 |
| Adjusted R2 | 0.078 |
| Residual Std. Error | 17.666 (df = 890) |
| F Statistic | 11.827*** (df = 7; 890) |
| Note: | p<0.1; p<0.05; p<0.01 |
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
|---|---|---|---|---|---|
| 896 | 281147.5 | NA | NA | NA | NA |
| 890 | 277773.6 | 6 | 3373.924 | 1.801702 | 0.0957771 |
My first model was clearly biased because there is a lot more affecting participation rate in 401k then just the employer match rate. The estimate in table one shows that as the employer match rate increase by one dollar we would estimate a 5.861 percentage point change in the employee participation. As for practical significance a one standard deviation change in mrate only changes the participation rate by about 5.2%. However, as previously mentioned this model is biased because it does not account for other variables.
The first model in table two considers the fact that larger companies will have a harder time having a high participation rate in their 401k plans because of their size. To first take care of the skewed distribution coming from the variable accounting for the number of employees eligible for the plan I used the log. This model helps back this idea because the match rate is very similar to table one but the log of the total employees eligible for the plan is negative. This means with a one percent increase in the percentage of employees who are eligible for the plan we would estimate a decrease of .02857 percentage points in the total participation rate. In this estimate the match rate is similar just slightly less than the model in table 1.
However, this is also not a good model because before even looking at the coefficients the constant for the participation rate is already above 100 percent which is not possible.
Lastly the final model seems to be the best and most accurate description of participation rate in 401k plans from the data I had.
The second model in table two brings up the idea that the participation rate is maxed out at 100% thus eventually no matter how much employers offer for their match rate the participation rate will no longer increase. From table two you can see that the coefficient for the squared term on mrate is negative meaning,
The expected change in employee participation rate when employer match increases by $1 is \(15.760 – 2*2.961mrate\)
It implies participation is initially increases as the employer match rises, but eventually tapers off. The optimal match rate can be found by setting the equation above equal to 0 and solving for mrate. This yields an optimal match rate of approximately $2.66 dollars, while controlling for the total number of employers eligible for the plan.
Throughout my regression I can see that participation in employee 401k plans generally increased as the match rate increased. Even though the coefficients seem to be statistically significant at first glance the models lack the needed variables to properly estimate the employee participation in the 401k plans. I am unable to conclude that the match rate is the most important causal relationship of the participation. I see numerous companies with low match rates and high participation rates and also companies with high match rates and low participation rates with respect to the other companies in the data. I can better conclude that generally people tended to respond to the higher incentive in this case match rate, but they do not always respond in the ways we expect.
I was unable to control for multiple different variables that could bias my coefficients. For example, I was unable to control for the average debt of the employees eligible for the plan. If employers are focusing on paying off their debt, they will most likely not participate in the 401k plan. Further research could help find better predictors of participation rate and more companies involved in the study would always be helpful. I would have also liked to have the industry of the company, average annual salaries, average level of education and other descriptive statistics of those that participated in the plans.
Stobierski, T. (2018, March 30). 401(k) Basics: When It Was Invented and How It Works. Retrieved November 29, 2019, from https://www.northwesternmutual.com/life-and-money/your-401k-when-it-was-invented-and-why/.