All Employees, Total Nonfarm

From the St. Louis Fed: “All Employees: Total Nonfarm, commonly known as Total Nonfarm Payroll, is a measure of the number of U.S. workers in the economy that excludes proprietors, private household employees, unpaid volunteers, farm employees, and the unincorporated self-employed. This measure accounts for approximately 80 percent of the workers who contribute to Gross Domestic Product (GDP).”

The number is relevant because it represents the number of jobs lost or gained in the economy. Increases in employment should indicate that businesses are hiring (growing); all else equal, adding employees means that disposable income has also increased. The BLS (Bureau of Labor Statistics) takes into account seasonal effects to show non-season changes.

Libraries

library(seasonal)
## Warning: package 'seasonal' was built under R version 3.6.2
library(fpp2)
## Warning: package 'fpp2' was built under R version 3.6.2
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ─────────────────────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2   3.3.2     ✓ fma       2.4  
## ✓ forecast  8.13      ✓ expsmooth 2.3
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'forecast' was built under R version 3.6.2
## 
library(forecast)
library(knitr)

Importing and Data Processing

payems = read.csv("/Users/nelsonwhite/Documents/ms applied economics/Predictive Analytics:Forecasting/assignment 3/PAYEMS.csv")

head(payems)
##         DATE PAYEMS
## 1 1939-01-01  29923
## 2 1939-02-01  30100
## 3 1939-03-01  30280
## 4 1939-04-01  30094
## 5 1939-05-01  30299
## 6 1939-06-01  30502

All figures are in thousands, i.e. 29,923 thousand = 29,923,000 or nearly 30 million.

payems.ts = ts(payems[,2], start=c(1939,01,01), frequency=12)
autoplot(payems.ts, ylab="Thousands of Employees", xlab="Date", main="Total Employees, Nonfarm")

There is a pattern of fluctuations in the data, however increases tend to last for a few years while decreases tend to be a year at most. As such it does not qualify as “seasonality” exactly. Some recessions are captured in 2008, 2001, 1982-3, and of course, a sharp dip in 2020 as a result of covid.

(lambda <- BoxCox.lambda(payems.ts))
## [1] 0.1868876
payems.boxcox = BoxCox(payems.ts, lambda=0.186887598656762)

autoplot(payems.boxcox)

BoxCox.lambda chose lambda = 0.1869 for the transformation. This data is tricky because of fluctuations in the 1940’s and 50’s, followed by an upward trend with significant dips culminating with a crater in 2020.

For the purpose of forecasting total nonfarm employees, I will use ETS, Arima, and STL, followed by diagnostics for each forecast model and finally, a discussion of which one fits the data the best.

ETS

payems.ets = ets(payems.ts)
autoplot(payems.ets)

forecast(payems.ets)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 2020       142415.5 141317.9 143513.1 140736.8 144094.2
## Dec 2020       142458.0 140898.1 144018.0 140072.4 144843.7
## Jan 2021       142500.6 140580.7 144420.5 139564.3 145436.9
## Feb 2021       142543.2 140315.3 144771.0 139135.9 145950.4
## Mar 2021       142585.7 140082.6 145088.8 138757.6 146413.8
## Apr 2021       142628.3 139872.9 145383.7 138414.2 146842.3
## May 2021       142670.8 139680.1 145661.5 138096.9 147244.7
## Jun 2021       142713.4 139500.6 145926.1 137799.9 147626.9
## Jul 2021       142755.9 139331.7 146180.1 137519.1 147992.8
## Aug 2021       142798.5 139171.6 146425.3 137251.6 148345.3
## Sep 2021       142841.0 139018.8 146663.3 136995.4 148686.6
## Oct 2021       142883.6 138872.2 146894.9 136748.7 149018.4
## Nov 2021       142926.1 138731.0 147121.3 136510.2 149342.1
## Dec 2021       142968.7 138594.4 147343.0 136278.8 149658.6
## Jan 2022       143011.2 138461.9 147560.6 136053.6 149968.9
## Feb 2022       143053.8 138332.9 147774.6 135833.9 150273.7
## Mar 2022       143096.3 138207.2 147985.5 135619.1 150573.6
## Apr 2022       143138.9 138084.3 148193.5 135408.6 150869.2
## May 2022       143181.4 137964.0 148398.9 135202.1 151160.8
## Jun 2022       143224.0 137846.0 148602.0 134999.0 151449.0
## Jul 2022       143266.5 137730.0 148803.1 134799.2 151733.9
## Aug 2022       143309.1 137616.0 149002.2 134602.2 152016.0
## Sep 2022       143351.7 137503.6 149199.7 134407.9 152295.5
## Oct 2022       143394.2 137392.8 149395.6 134215.9 152572.5
autoplot(forecast(payems.ets))

accuracy(forecast(payems.ets))
##                     ME     RMSE     MAE        MPE      MAPE       MASE
## Training set -11.44142 730.9178 193.159 -0.0134986 0.2545081 0.09003549
##                    ACF1
## Training set 0.04426306
checkresiduals(payems.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,N)
## Q* = 52.192, df = 20, p-value = 0.0001067
## 
## Model df: 4.   Total lags used: 24

From the residuals plot, there appears to be autocorrelation in the 1940’s and 50’s, and of course a severe dip in 2020. The ACF plot reveals autocorrelation at Lags 1, 3, and 5. The residuals graph shows a (mostly) normal distribution with the exception of outliers at -0.125 and -0.05.

ETS, Transformed Data

payems.boxcox.ets = ets(payems.boxcox)
autoplot(payems.boxcox.ets)

forecast(payems.boxcox.ets)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 2020       43.79227 43.71859 43.86595 43.67959 43.90496
## Dec 2020       43.78481 43.67631 43.89331 43.61887 43.95075
## Jan 2021       43.77766 43.63960 43.91573 43.56651 43.98882
## Feb 2021       43.77081 43.60553 43.93609 43.51804 44.02359
## Mar 2021       43.76425 43.57305 43.95545 43.47183 44.05666
## Apr 2021       43.75796 43.54165 43.97427 43.42714 44.08878
## May 2021       43.75193 43.51105 43.99281 43.38354 44.12033
## Jun 2021       43.74615 43.48109 44.01122 43.34078 44.15153
## Jul 2021       43.74062 43.45167 44.02957 43.29871 44.18253
## Aug 2021       43.73532 43.42271 44.04792 43.25723 44.21340
## Sep 2021       43.73023 43.39417 44.06630 43.21627 44.24420
## Oct 2021       43.72536 43.36602 44.08471 43.17579 44.27494
## Nov 2021       43.72070 43.33822 44.10317 43.13575 44.30564
## Dec 2021       43.71622 43.31077 44.12168 43.09613 44.33632
## Jan 2022       43.71194 43.28364 44.14023 43.05692 44.36696
## Feb 2022       43.70783 43.25684 44.15882 43.01810 44.39757
## Mar 2022       43.70390 43.23034 44.17745 42.97966 44.42813
## Apr 2022       43.70013 43.20415 44.19610 42.94160 44.45865
## May 2022       43.69651 43.17826 44.21477 42.90391 44.48911
## Jun 2022       43.69305 43.15266 44.23344 42.86659 44.51951
## Jul 2022       43.68973 43.12735 44.25212 42.82964 44.54983
## Aug 2022       43.68655 43.10232 44.27079 42.79304 44.58006
## Sep 2022       43.68351 43.07757 44.28944 42.75680 44.61021
## Oct 2022       43.68059 43.05310 44.30808 42.72092 44.64025
autoplot(forecast(payems.boxcox.ets))

accuracy(payems.boxcox.ets)
##                       ME       RMSE        MAE       MPE       MAPE
## Training set 0.003877573 0.05390847 0.01867791 0.0103625 0.04981523
##                    MASE       ACF1
## Training set 0.08373486 0.03810632
checkresiduals(forecast(payems.boxcox.ets))

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,Ad,N)
## Q* = 16.485, df = 19, p-value = 0.6247
## 
## Model df: 5.   Total lags used: 24

There is a similar pattern between the residuals with the transformed and non-transformed data. However the ACF graph suggests autocorrelation at Lag 2 and possibly at Lag 1.

ARIMA

payems.arima = auto.arima(payems.ts)
autoplot(forecast(payems.arima))

forecast(payems.arima)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 2020       142443.7 141516.9 143370.6 141026.3 143861.2
## Dec 2020       142488.2 141141.6 143834.9 140428.7 144547.8
## Jan 2021       142604.7 141007.0 144202.4 140161.2 145048.2
## Feb 2021       142728.4 140920.1 144536.7 139962.8 145494.0
## Mar 2021       142843.2 140840.2 144846.2 139779.9 145906.6
## Apr 2021       142956.6 140775.2 145137.9 139620.5 146292.7
## May 2021       143071.0 140725.5 145416.6 139483.8 146658.3
## Jun 2021       143185.7 140686.9 145684.6 139364.1 147007.4
## Jul 2021       143300.3 140657.0 145943.6 139257.7 147342.9
## Aug 2021       143414.9 140634.6 146195.2 139162.8 147667.0
## Sep 2021       143529.4 140618.6 146440.3 139077.7 147981.2
## Oct 2021       143644.0 140608.2 146679.8 139001.1 148286.8
## Nov 2021       143758.5 140602.8 146914.3 138932.2 148584.9
## Dec 2021       143873.1 140601.7 147144.5 138869.9 148876.2
## Jan 2022       143987.6 140604.6 147370.7 138813.7 149161.5
## Feb 2022       144102.2 140611.1 147593.3 138763.0 149441.4
## Mar 2022       144216.8 140620.8 147812.7 138717.2 149716.3
## Apr 2022       144331.3 140633.5 148029.1 138676.0 149986.6
## May 2022       144445.9 140648.9 148242.8 138639.0 150252.8
## Jun 2022       144560.4 140666.9 148454.0 138605.8 150515.1
## Jul 2022       144675.0 140687.2 148662.8 138576.1 150773.8
## Aug 2022       144789.5 140709.6 148869.4 138549.9 151029.2
## Sep 2022       144904.1 140734.1 149074.1 138526.7 151281.5
## Oct 2022       145018.6 140760.5 149276.8 138506.4 151530.9
checkresiduals(payems.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,0) with drift
## Q* = 4.4584, df = 21, p-value = 0.9999
## 
## Model df: 3.   Total lags used: 24
accuracy(payems.arima)
##                        ME     RMSE      MAE         MPE      MAPE
## Training set -0.006787446 721.7517 201.5263 0.001100507 0.2658125
##                    MASE         ACF1
## Training set 0.09393565 -0.001448402

ARIMA, Transformed Data

payems.boxcox.arima = auto.arima(payems.boxcox)
autoplot(forecast(payems.boxcox.arima))

forecast(payems.boxcox.arima)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 2020       43.79962 43.73031 43.86894 43.69361 43.90564
## Dec 2020       43.79919 43.70027 43.89810 43.64791 43.95046
## Jan 2021       43.79875 43.67652 43.92098 43.61181 43.98569
## Feb 2021       43.79831 43.65591 43.94071 43.58052 44.01610
## Mar 2021       43.79787 43.63725 43.95850 43.55222 44.04353
## Apr 2021       43.79744 43.61993 43.97494 43.52596 44.06891
## May 2021       43.79700 43.60359 43.99040 43.50121 44.09279
## Jun 2021       43.79656 43.58800 44.00512 43.47759 44.11553
## Jul 2021       43.79612 43.57300 44.01925 43.45488 44.13736
## Aug 2021       43.79568 43.55847 44.03290 43.43289 44.15848
## Sep 2021       43.79525 43.54432 44.04617 43.41149 44.17900
## Oct 2021       43.79481 43.53050 44.05911 43.39059 44.19903
## Nov 2021       43.79437 43.51695 44.07179 43.37010 44.21864
## Dec 2021       43.79393 43.50363 44.08423 43.34995 44.23791
## Jan 2022       43.79349 43.49050 44.09649 43.33010 44.25688
## Feb 2022       43.79306 43.47753 44.10858 43.31051 44.27561
## Mar 2022       43.79262 43.46471 44.12053 43.29113 44.29411
## Apr 2022       43.79218 43.45201 44.13235 43.27193 44.31243
## May 2022       43.79174 43.43941 44.14408 43.25290 44.33059
## Jun 2022       43.79130 43.42690 44.15571 43.23400 44.34861
## Jul 2022       43.79087 43.41447 44.16726 43.21522 44.36652
## Aug 2022       43.79043 43.40210 44.17875 43.19654 44.38432
## Sep 2022       43.78999 43.38979 44.19019 43.17794 44.40204
## Oct 2022       43.78955 43.37753 44.20157 43.15942 44.41969
checkresiduals(payems.boxcox.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,1)
## Q* = 22.018, df = 23, p-value = 0.5192
## 
## Model df: 1.   Total lags used: 24
accuracy(payems.boxcox.arima)
##                        ME      RMSE        MAE          MPE       MAPE
## Training set -0.002107303 0.0540057 0.02027267 -0.005653277 0.05412669
##                    MASE      ACF1
## Training set 0.09088429 0.1023833

STL

payems.stl = stl(payems.ts, s.window = "periodic")
autoplot(payems.stl)

forecast(payems.stl)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 2020       142379.6 141273.5 143485.7 140688.0 144071.3
## Dec 2020       142431.8 140860.6 144003.0 140028.8 144834.8
## Jan 2021       142470.4 140537.4 144403.3 139514.2 145426.6
## Feb 2021       142521.8 140279.8 144763.7 139093.0 145950.5
## Mar 2021       142579.4 140061.6 145097.1 138728.8 146430.0
## Apr 2021       142394.1 139623.8 145164.4 138157.2 146631.0
## May 2021       142510.4 139504.8 145516.0 137913.8 147107.1
## Jun 2021       142644.8 139417.5 145872.2 137709.1 147580.6
## Jul 2021       142723.3 139285.1 146161.5 137465.1 147981.6
## Aug 2021       142820.0 139179.9 146460.0 137252.9 148387.0
## Sep 2021       142887.9 139053.4 146722.4 137023.5 148752.2
## Oct 2021       142956.1 138933.6 146978.6 136804.3 149108.0
## Nov 2021       142962.8 138757.8 147167.8 136531.8 149393.8
## Dec 2021       143015.0 138632.3 147397.6 136312.3 149717.7
## Jan 2022       143053.6 138497.4 147609.7 136085.5 150021.6
## Feb 2022       143104.9 138379.1 147830.8 135877.4 150332.5
## Mar 2022       143162.5 138270.3 148054.8 135680.5 150644.6
## Apr 2022       142977.3 137921.6 148033.0 135245.2 150709.3
## May 2022       143093.6 137877.1 148310.1 135115.7 151071.5
## Jun 2022       143228.0 137853.2 148602.8 135008.0 151448.1
## Jul 2022       143306.5 137775.5 148837.5 134847.6 151765.4
## Aug 2022       143403.1 137718.0 149088.2 134708.5 152097.8
## Sep 2022       143471.1 137633.6 149308.5 134543.4 152398.7
## Oct 2022       143539.3 137551.2 149527.4 134381.3 152697.3
autoplot(forecast(payems.stl))

checkresiduals(forecast(payems.stl))
## Warning in checkresiduals(forecast(payems.stl)): The fitted degrees of
## freedom is based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,A,N)
## Q* = 22.218, df = 20, p-value = 0.3288
## 
## Model df: 4.   Total lags used: 24
accuracy(forecast(payems.stl))
##                     ME     RMSE     MAE         MPE      MAPE       MASE
## Training set -11.77651 727.0691 202.275 -0.01391427 0.2643979 0.09428466
##                   ACF1
## Training set 0.0472515

STL, Transformed Data

payems.boxcox.stl = stl(payems.boxcox, s.window = "periodic")
autoplot(payems.boxcox.stl)

forecast(payems.boxcox.stl)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Nov 2020       43.78882 43.71544 43.86220 43.67659 43.90104
## Dec 2020       43.77989 43.67200 43.88778 43.61489 43.94490
## Jan 2021       43.76961 43.63251 43.90672 43.55993 43.97930
## Feb 2021       43.75970 43.59574 43.92366 43.50895 44.01045
## Mar 2021       43.75350 43.56400 43.94301 43.46368 44.04333
## Apr 2021       43.73001 43.51576 43.94426 43.40235 44.05768
## May 2021       43.72830 43.48984 43.96677 43.36360 44.09301
## Jun 2021       43.72778 43.46547 43.99009 43.32661 44.12895
## Jul 2021       43.72184 43.43595 44.00772 43.28461 44.15906
## Aug 2021       43.72098 43.41173 44.03024 43.24802 44.19395
## Sep 2021       43.71474 43.38229 44.04720 43.20630 44.22319
## Oct 2021       43.70884 43.35332 44.06435 43.16512 44.25255
## Nov 2021       43.70092 43.32247 44.07938 43.12213 44.27972
## Dec 2021       43.69520 43.29392 44.09648 43.08150 44.30891
## Jan 2022       43.68801 43.26401 44.11201 43.03956 44.33646
## Feb 2022       43.68107 43.23446 44.12769 42.99803 44.36412
## Mar 2022       43.67774 43.20861 44.14688 42.96026 44.39522
## Apr 2022       43.65701 43.16546 44.14856 42.90525 44.40877
## May 2022       43.65797 43.14410 44.17183 42.87208 44.44385
## Jun 2022       43.66001 43.12393 44.19608 42.84015 44.47986
## Jul 2022       43.65653 43.09836 44.21471 42.80288 44.51019
## Aug 2022       43.65806 43.07789 44.23823 42.77077 44.54535
## Sep 2022       43.65411 43.05206 44.25616 42.73335 44.57487
## Oct 2022       43.65042 43.02660 44.27424 42.69636 44.60447
autoplot(forecast(payems.boxcox.stl))

checkresiduals(forecast(payems.boxcox.stl))
## Warning in checkresiduals(forecast(payems.boxcox.stl)): The fitted degrees
## of freedom is based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,Ad,N)
## Q* = 16.743, df = 19, p-value = 0.6073
## 
## Model df: 5.   Total lags used: 24
accuracy(forecast(payems.boxcox.stl))
##                       ME       RMSE       MAE         MPE       MAPE
## Training set 0.003563549 0.05363163 0.0193121 0.009528388 0.05135113
##                    MASE       ACF1
## Training set 0.08657796 0.04634693

Results

Raw Data:

Visually, Arima(2,1,0) with drift performs well. Barring some kind of significant negative development, we can expect that total nonfarm payrolls will continue to increase at a steady pace. ETS and STL also predicted that steady increase, but they also had wider margins.

Arima also had the lowest error terms. The ACF graph breaks no lags; the time plot of the residuals is mostly white noise until the crater in 2020; and the residuals graph is normally distrubuted again, with the exception of the 2020 crater.

Transformed Data:

The transformed data tells a different story. Because the graph is so much “flatter,” each forecasting method predicted a decline or steady/no change. In addition, the residual plots for all of the methods showed more patterns in the beginning (1940-50s), and the ACF graph broke lags more often during that time period.

For these reasons, I believe that the Arima(2,1,0) model performs the best for this dataset