Total Nonfarm Separations is the measure of the number of U.S. workers in the economy excluding: proprietors, private household employees, unpaid volunteers, farm employees, and unincorporated self-employed workers who have either been laid off or severance from being employed with a company he/she was originally employed at.
This data is collected by the U.S. Bureau of Labor Statistics and is reported by the bureau on a monthly frequency. Separations are described as a “… measure the number of workers who are projected to leave an occupation.” by the institution. These separations can be caused from people leaving the labor force, transfer to another occupation, retiring individuals, death, or being let go from their position for any reason. The design of this data frame works to “The separations method is designed to”… estimate the number of workers who leave their occupation and need to be replaced by new entrants into the occupation. It is not a measure of all movement in and out of occupations, but instead an estimate of workers who permanently leave an occupation.”
The source of the data comes from the all firms in the economies payroll excluding farm and farm related industries during the entire reference month, and is aggregated by type of separation. This is then published in a monthly report “Job Openings and Labor Turnover Survey (JOLTS).”
Methodological deficiencies can arise when there are problems gathering data or analyzing the data. Problems can arise when the person constructing the research has a biased view towards the data that is being constructed. It can be skewed to show what the person doing the study wants it to do. Another problem can arise when the sample size of the data is off, whether it may be too small to do an actual study on it or too big for the data to even be significant. Overfitting can also be a methodological deficiency; overfitting occurs when the statistical model fits exactly against its training data. When the data has been overfit the algorithm cannot perform accurately.
After running an autocorrelation function plot of the data set we can see that there is nonstationarity in the data frame. This will result in us need to do some calculations in order to make the data stationary. The reasoning behind our conclusion that the data is non stationary comes from the data set crossing back over the zero origin until resting at zero. This pattern of nonstationaryity signals that we need to use a partial autocorrelation plot to indicate how many lags that we will need in our ARIMA model.
After running this PACF plot we can see that their is a positive correlation in the errors over the following fifteen observations, outside of the two that were inverse. This indicates we should should include 15 lags in our model.
Being that the data is non stationary the type of calculation that needs to be made is a ARIMA model with 12 lags along with the first difference being subtracted. This conclusion is made out of the PACF showing the value of 12-15 lags with current correlation and difference allowing magnitude of change being reduced.
This is the mathematical expression of the ARIMA model that we built in 3a;
\[
Y_t = c \ + \alpha X_{t-1} + \beta X_{t-2} + \gamma X_{t-3} + \delta X_{t-4} + \psi X_{t-5} + \zeta X_{t-6} + \eta X_{t-7} + \epsilon
\]
The coefficient estimates for this model output as;
| Dependent variable: | |
| ar1 | -0.298*** |
| (0.063) | |
| ar2 | -0.642*** |
| (0.065) | |
| ar3 | -0.284*** |
| (0.075) | |
| ar4 | -0.379*** |
| (0.073) | |
| ar5 | -0.258*** |
| (0.074) | |
| ar6 | -0.188*** |
| (0.064) | |
| ar7 | -0.133** |
| (0.062) | |
| Observations | 248 |
| Log Likelihood | -1,989.029 |
| sigma2 | 539,539.000 |
| Akaike Inf. Crit. | 3,994.058 |
| Note: | p<0.1; p<0.05; p<0.01 |
When looking at the output of the errors in the model of the in-sample forecast we see that the model fairs well going into 2009. After 2009 we see a quick period of downward movement that then is met with a permanently higher level. This stays true for 10 years until March of 2020 when the COVID-19 shock crushes the model. Considering the lags that are used in this model this increases the the negative estimates going forward due to the magnitude of the error in the previous periods. This happens for seven periods until the model returns to a relatively steady state.
We are including two variables that may have collaborate with the separations in the economy.
The first variable we have chosen is the discount rate of the United States Federal Reserve. The relationship between the discount rate and the number of separations in the U.S. may be endogenous so we must run a two stage least squares in order to remove this endogeneity. The intuition behind using this variable has to do with the corollary effects of interest rates on business or peoples willingness to leave their profession. Lower discount rates can lead to entrepreneurs having the leverage to chase their long awaited dreams as capital becomes cheaper to borrow. This relationship can also be inverse in which capital becomes too expensive to allow for a feasible leap of faith or continuation of a highly leveraged start up.
The second of the exogenous variables has to do with data in accordance with the level of personal expenditure. The intuition behind using this data set has to do with willingness to consume and people in good positions willing to separate from capital. As stated previously this relationship can also be inverse as people separate from their job they are less willing to consume therefore they will hold on to the money that they have in order to ensure financial safety for tomorrow.
The mathematical representation of this model is shown as; \[ Y_t = c \ + \alpha X_{t-1} + \beta X_{t-2} + \gamma X_{t-3} + \delta X_{t-4} + \psi X_{t-5} + \zeta X_{t-6} + \eta X_{t-7} + \omega (rates) + \sigma (Personal\ Consumption\ Expenditures) + \epsilon \]
Estimations of this model result in;
| Dependent variable: | |
| ar1 | -0.323*** |
| (0.072) | |
| ar2 | -0.604*** |
| (0.073) | |
| ar3 | -0.257*** |
| (0.086) | |
| ar4 | -0.317*** |
| (0.083) | |
| ar5 | -0.211** |
| (0.083) | |
| ar6 | -0.139* |
| (0.071) | |
| ar7 | -0.091 |
| (0.067) | |
| value)1 | -206.307 |
| (126.786) | |
| value)2 | -0.674*** |
| (0.236) | |
| Observations | 248 |
| Log Likelihood | -1,981.924 |
| sigma2 | 509,908.800 |
| Akaike Inf. Crit. | 3,983.847 |
| Note: | p<0.1; p<0.05; p<0.01 |
Accuracy of model #3
## ME RMSE MAE MPE MAPE MASE
## Training set 2.627615 126.3088 95.57258 0.005907425 1.971139 0.7563799
## ACF1
## Training set 0.02226018
Accuracy of model #4
## ME RMSE MAE MPE MAPE MASE
## Training set -0.6377617 124.8731 93.7021 -0.07062282 1.934453 0.7415766
## ACF1
## Training set 0.02371791