1 Introduction


Economic indicators, especially in the labor market, are often looked to to make more informed decisions under periods of uncertainty. The unemployment rate is often the first metric individuals, businesses, and government agencies look towards to understand the health of the economy. However, the unemployment rate is just a macro-level indicator; there is a lot more behind the scenes. In particular, job seperations and the job seperations rate are indicative of the unemployment rate as “if the job finding rate remains constant, an increase in the job separation rate will increase the unemployment rate” (Wiczer, 2014). The job seperations rate also illuminates the current “sentiment” of the economy as the metric encompasses two components: voluntary (quits) and involuntary separations (layoffs and fires).

The following report looks at the labor market seperations rate and attempts to decompose the trends and cross-validate the forecast results with different sample sizes. The data comes from the Job Openings and Labor Turnover Survey (JOLTS) taken by the Bureau of Labor Statistics.

2 Analysis


In the following analysis we will first define the time series object with frequency=12, decompose the time series by classical and STL decomposition, and finally split the data into training and testing to cross-validate the forecast results with different sample sizes.

2.1 Import the data, define a time series object, and preliminary analysis


jolts <- read.csv("C:/Users/Angelo/OneDrive/Desktop/College Babyyyyyyy/Fourth Year/STA321/data/JTSOSL.csv")
  jolts <- jolts[118:267,]

jolts.ts <- ts(jolts$JTSOSL, start = c(2010,9), end = c(2023,2), frequency = 12)  

trend.beer = ma(jolts.ts, order = 4, centre = T)  # centre = T => centered moving average
plot(jolts.ts, main = "Total Nonfarm Labor Market Seperations", ylab="Seperations", xlab="Period", col= "darkred")
lines(trend.beer, col="blue", lwd=2)
legend("topleft", c("original series", "trend curve"), lwd=rep(2,2),
       col=c("darkred", "blue"), bty="n")

In the above time series graph, we can observe various peaks and troughs throughout the period between 09-2010 and 02-2023. Particularly important are the periods between 2014 through 2016 and mid-2021 through mid-2022 where seperations reached their peak. This is verified by the moving average trend curve.

2.2 Classic and STL decomposition


We use two methods to decompose the trend of the seperations, classical and STL. Keep in mind that classical decomposition suffers and underperforms when the trend is robust; therefor, STL will produce more reliable results.

2.2.1 Classical Decomposition


cls.decomp = decompose(jolts.ts)
par(mar=c(2,2,2,2))
plot(cls.decomp, xlab="")

2.2.2 STL Decomposition


stl.decomp=stl(jolts.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp, main = "STL Decomposition")

We can notice that the trend lines between classical and STL decomposition are very similar; however, the seasonal trend is very different which shows the different behaviors of the two decompositions methods. As stated above, STL decomposition is better suited at decomposing the seasonality of more complex time series data.

2.3 Forecasting and Cross-Validating


Next, we will split the data into training and testing for four different sample sizes holding out the last six observations (months) of data as testing data (so from 09-2022 to 03-2023). The four sample sizes are as follows: 1) 09-2010 through 08-2022, 2) 10-2014 through 08-2022, 3) 11-2016 through 08-2022, and 4) 12-2018 through 08-2022. We will then use these four different training sets to forecast the last six observations (months) of data.

ini.data = jolts[,2]
n0 = length(ini.data)
##
train.data01 = jolts[1:(n0-6), 2]
train.data02 = jolts[50:(n0-6), 2]
train.data03 = jolts[75:(n0-6), 2]
train.data04 = jolts[100:(n0-6), 2]
## last 7 observations
test.data = jolts[(n0-6):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(2010, 9))
train02.ts = ts(train.data02, frequency = 12, start = c(2014, 10))
train03.ts = ts(train.data03, frequency = 12, start = c(2016, 11))
train04.ts = ts(train.data04, frequency = 12, start = c(2018, 12))
##
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=7, method="naive")
fcst02 = forecast(stl02,h=7, method="naive")
fcst03 = forecast(stl03,h=7, method="naive")
fcst04 = forecast(stl04,h=7, method="naive")

fdp <- c("Sep. 2022","Oct. 2022","Nov. 2022","Dec. 2022","Jan. 2023","Feb. 2023","Mar. 2023",377.4679,390.9996,384.2322,369.5012,377.9487,363.6742,378.4626)

fcst03table <- matrix(fdp, nrow=7, ncol=2) %>% as.data.frame()
  colnames(fcst03table) <- c("Months","Point Forecast")

pander(fcst03table, caption = "Point estimates for Sep. 2022 - Mar. 2023")
Point estimates for Sep. 2022 - Mar. 2023
Months Point Forecast
Sep. 2022 377.4679
Oct. 2022 390.9996
Nov. 2022 384.2322
Dec. 2022 369.5012
Jan. 2023 377.9487
Feb. 2023 363.6742
Mar. 2023 378.4626

The above table shows the forecasted values for the last six observations (months) of data for the period 11-2016 through 08-2022. We chose this training set based on the results in the next section where the sample size of 69 produced the best accurracy.

2.3.1 Comparing errors between training sets


Building upon the previous section, we will now compare the errors (or accurracy) of our four sample sizes. In particular, we will use two error measures: mean square error (MSE) and mean absolute percentage error (MAPE).

## To compare different errors, we will not use the percentage for MAPE
PE01=(test.data-fcst01$mean)/fcst01$mean
PE02=(test.data-fcst02$mean)/fcst02$mean
PE03=(test.data-fcst03$mean)/fcst03$mean
PE04=(test.data-fcst04$mean)/fcst04$mean
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
E2=test.data-fcst02$mean
E3=test.data-fcst03$mean
E4=test.data-fcst04$mean
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.144", "n.106", "n. 69", "n. 44")
pander(accuracy, caption="Error comparison between forecast results with different sample sizes")
Error comparison between forecast results with different sample sizes
  MSE MAPE
n.144 5333 0.1701
n.106 5373 0.1703
n. 69 5081 0.1664
n. 44 5068 0.1677

The above table reveals the error rates of each of the training sets. Training sets 3 and 4 (sample sizes n=69,44 respectively) outperformed the other two models. We will now build a graphical representation of our error rates for easier interprebility.

2.3.2 Graphical representation


In the following section, we build a graphical representation of the error terms shown in the above table. We find that while n=44 has the lowest MSE, n=69 might be the better training set as it has the lowest MAPE and very similar MSE to n=44.

par(mfrow=c(2,1), mar=c(3,4,3,1))

plot(seq(1,4, by=1), MSE, lwd=2,type="b", ylab="MSE", xlab="", cex=0.3,
     main="Error Curves", col = "blue", xaxt='n')
axis(1, at=seq(1,4,by=1),c("n=144", "n=106","n=69", "n=44"))

plot(seq(1,4, by=1), MAPE, lwd=2,type="b", ylab="MAPE", xlab="", cex=0.3,
     main="", col = "darkred", xaxt='n')
axis(1, at=seq(1,4,by=1),c("n=144", "n=106","n=69", "n=44"))

3 Conclusion


We have developed in the above report a model to forecast the rate of job seperation based on the JOLTS survey conducted by the Bureau of Labor Statistics (BLS). In particular, we developed a background of the job seperations rate, gained preliminary insight on the job seperations rate between the periods of Sep. 2010 and Mar. 2023, decomposed the trend of the job seperations rate using classical and STL decomposition methods, split the data into training and testing by holding out the last six observations (months), used four different sample sizes to train four models to forecast the test data, calculated the error rates of the four models, and developed a graphical representation of those error rates. Lastly, we found that, out of the four training sets, the training set with the sample size of n=69 is the best for forecasting future values of the job seperation rate.