Question One

1a » Provide a plot with national recessions indicated.


1b » Explain your data.

Total Nonfarm Separations is the measure of the number of U.S. workers in the economy excluding: proprietors, private household employees, unpaid volunteers, farm employees, and unincorporated self-employed workers who have either been laid off or severance from being employed with a company he/she was originally employed at.

1c » How is your data collected/determined?

This data is collected by the U.S. Bureau of Labor Statistics and is reported by the bureau on a monthly frequency. Separations are described as a “… measure the number of workers who are projected to leave an occupation.” by the institution. These separations can be caused from people leaving the labor force, transfer to another occupation, retiring individuals, death, or being let go from their position for any reason. The design of this data frame works to “The separations method is designed to”… estimate the number of workers who leave their occupation and need to be replaced by new entrants into the occupation. It is not a measure of all movement in and out of occupations, but instead an estimate of workers who permanently leave an occupation.”

1d » What is the source of the data?

The source of the data comes from the all firms in the economies payroll excluding farm and farm related industries during the entire reference month, and is aggregated by type of separation. This is then published in a monthly report “Job Openings and Labor Turnover Survey (JOLTS).”

1e » Explain any potential methodological deficiencies in the data.

Methodological deficiencies can arise when there are problems gathering data or analyzing the data. Problems can arise when the person constructing the research has a biased view towards the data that is being constructed. It can be skewed to show what the person doing the study wants it to do. Another problem can arise when the sample size of the data is off, whether it may be too small to do an actual study on it or too big for the data to even be significant. Overfitting can also be a methodological deficiency; overfitting occurs when the statistical model fits exactly against its training data. When the data has been overfit the algorithm cannot perform accurately.

Question Two

2b » If not, what type of calculation do you plan to do to make it stationary? Re-plot if necessary.

Being that the data is non stationary the type of calculation that needs to be made is a ARIMA model with 12 lags along with the first difference being subtracted. This conclusion is made out of the PACF showing the value of 12-15 lags with current correlation and difference allowing magnitude of change being reduced.

Question Three

3a » Determine the best possible ARIMA model of the data.

3b » State the mathematical representation of this model. Provide coefficient estimates of this model.


This is the mathematical expression of the ARIMA model that we built in 3a;

\[ Y_t = c \ + \alpha X_{t-1} + \beta X_{t-2} + \gamma X_{t-3} + \delta X_{t-4} + \psi X_{t-5} + \zeta X_{t-6} + \eta X_{t-7} + \epsilon \]
The coefficient estimates for this model output as;

Dependent variable:
ar1 -0.298***
(0.063)
ar2 -0.642***
(0.065)
ar3 -0.284***
(0.075)
ar4 -0.379***
(0.073)
ar5 -0.258***
(0.074)
ar6 -0.188***
(0.064)
ar7 -0.133**
(0.062)
Observations 248
Log Likelihood -1,989.029
sigma2 539,539.000
Akaike Inf. Crit. 3,994.058
Note: p<0.1; p<0.05; p<0.01

3c » Plot the in-sample forecasts that the model generates with the actual data.



3d » Plot the errors of the regression. What can you conclude from these plots?

When looking at the output of the errors in the model of the in-sample forecast we see that the model fairs well going into 2009. After 2009 we see a quick period of downward movement that then is met with a permanently higher level. This stays true for 10 years until March of 2020 when the COVID-19 shock crushes the model. Considering the lags that are used in this model this increases the the negative estimates going forward due to the magnitude of the error in the previous periods. This happens for seven periods until the model returns to a relatively steady state.

Question Four

4a » Using some type of economic intuition, include two or more exogenous regressors to your model. Fully explain the economic intuition you are relying on.

We are including two variables that may have collaborate with the separations in the economy.

  1. The first variable we have chosen is the discount rate of the United States Federal Reserve. The relationship between the discount rate and the number of separations in the U.S. may be endogenous so we must run a two stage least squares in order to remove this endogeneity. The intuition behind using this variable has to do with the corollary effects of interest rates on business or peoples willingness to leave their profession. Lower discount rates can lead to entrepreneurs having the leverage to chase their long awaited dreams as capital becomes cheaper to borrow. This relationship can also be inverse in which capital becomes too expensive to allow for a feasible leap of faith or continuation of a highly leveraged start up.

  2. The second of the exogenous variables has to do with data in accordance with the level of personal expenditure. The intuition behind using this data set has to do with willingness to consume and people in good positions willing to separate from capital. As stated previously this relationship can also be inverse as people separate from their job they are less willing to consume therefore they will hold on to the money that they have in order to ensure financial safety for tomorrow.

4b » Restate the mathematical representation of this model. Collect all necessary data and estimate your new model.

The mathematical representation of this model is shown as; \[ Y_t = c \ + \alpha X_{t-1} + \beta X_{t-2} + \gamma X_{t-3} + \delta X_{t-4} + \psi X_{t-5} + \zeta X_{t-6} + \eta X_{t-7} + \omega (rates) + \sigma (Personal\ Consumption\ Expenditures) + \epsilon \]

Estimations of this model result in;

Dependent variable:
ar1 -0.323***
(0.072)
ar2 -0.604***
(0.073)
ar3 -0.257***
(0.086)
ar4 -0.317***
(0.083)
ar5 -0.211**
(0.083)
ar6 -0.139*
(0.071)
ar7 -0.091
(0.067)
value)1 -206.307
(126.786)
value)2 -0.674***
(0.236)
Observations 248
Log Likelihood -1,981.924
sigma2 509,908.800
Akaike Inf. Crit. 3,983.847
Note: p<0.1; p<0.05; p<0.01

4c » Provide some type of economic test to determine if these regressors belong in your model.

4d » Re-plot the predicted values and the new errors of the regression with the older ones.

Question Five

5a » Using the models from (3) and (4), you are to determine a test data set that is equal to 20 observations at the tail end of the data. Forecast each model in-sample and out-of-sample over the test data interval.

5b » Calculate accuracy test statistics for all four forecast test data sets.

Accuracy of model #3

##                    ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 2.627615 126.3088 95.57258 0.005907425 1.971139 0.7563799
##                    ACF1
## Training set 0.02226018

Accuracy of model #4

##                      ME     RMSE     MAE         MPE     MAPE      MASE
## Training set -0.6377617 124.8731 93.7021 -0.07062282 1.934453 0.7415766
##                    ACF1
## Training set 0.02371791

5c » Compare each model. What can you conclude?