Economic indicators, especially in the labor market, are often looked to
to make more informed decisions under periods of uncertainty. The
unemployment rate is often the first metric individuals, businesses, and
government agencies look towards to understand the health of the
economy. However, the unemployment rate is just a macro-level indicator;
there is a lot more behind the scenes. In particular, job seperations
and the job seperations rate are indicative of the unemployment rate as
“if the job finding rate remains constant, an increase in the job
separation rate will increase the unemployment rate” (Wiczer,
2014). The job seperations rate also illuminates the current
“sentiment” of the economy as the metric encompasses two components:
voluntary (quits) and involuntary separations (layoffs and fires).
The following report looks at the labor market seperations rate and attempts to decompose the trends and cross-validate the forecast results with different sample sizes. The data comes from the Job Openings and Labor Turnover Survey (JOLTS) taken by the Bureau of Labor Statistics.
In the following analysis we will first define the time series object
with frequency=12, decompose the time series by classical and STL
decomposition, and finally split the data into training and testing to
cross-validate the forecast results with different sample sizes.
jolts <- read.csv("C:/Users/Angelo/OneDrive/Desktop/College Babyyyyyyy/Fourth Year/STA321/data/JTSOSL.csv")
jolts <- jolts[118:267,]
jolts.ts <- ts(jolts$JTSOSL, start = c(2010,9), end = c(2023,2), frequency = 12)
trend.beer = ma(jolts.ts, order = 4, centre = T) # centre = T => centered moving average
plot(jolts.ts, main = "Total Nonfarm Labor Market Seperations", ylab="Seperations", xlab="Period", col= "darkred")
lines(trend.beer, col="blue", lwd=2)
legend("topleft", c("original series", "trend curve"), lwd=rep(2,2),
col=c("darkred", "blue"), bty="n")
In the above time series graph, we can observe various peaks and troughs throughout the period between 09-2010 and 02-2023. Particularly important are the periods between 2014 through 2016 and mid-2021 through mid-2022 where seperations reached their peak. This is verified by the moving average trend curve.
We use two methods to decompose the trend of the seperations, classical
and STL. Keep in mind that classical decomposition suffers and
underperforms when the trend is robust; therefor, STL will produce more
reliable results.
cls.decomp = decompose(jolts.ts)
par(mar=c(2,2,2,2))
plot(cls.decomp, xlab="")
stl.decomp=stl(jolts.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp, main = "STL Decomposition")
We can notice that the trend lines between classical and STL decomposition are very similar; however, the seasonal trend is very different which shows the different behaviors of the two decompositions methods. As stated above, STL decomposition is better suited at decomposing the seasonality of more complex time series data.
Next, we will split the data into training and testing for four
different sample sizes holding out the last six observations
(months) of data as testing data (so from 09-2022 to 03-2023).
The four sample sizes are as follows: 1) 09-2010 through 08-2022, 2)
10-2014 through 08-2022, 3) 11-2016 through 08-2022, and 4) 12-2018
through 08-2022. We will then use these four different training sets to
forecast the last six observations (months) of data.
ini.data = jolts[,2]
n0 = length(ini.data)
##
train.data01 = jolts[1:(n0-6), 2]
train.data02 = jolts[50:(n0-6), 2]
train.data03 = jolts[75:(n0-6), 2]
train.data04 = jolts[100:(n0-6), 2]
## last 7 observations
test.data = jolts[(n0-6):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(2010, 9))
train02.ts = ts(train.data02, frequency = 12, start = c(2014, 10))
train03.ts = ts(train.data03, frequency = 12, start = c(2016, 11))
train04.ts = ts(train.data04, frequency = 12, start = c(2018, 12))
##
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=7, method="naive")
fcst02 = forecast(stl02,h=7, method="naive")
fcst03 = forecast(stl03,h=7, method="naive")
fcst04 = forecast(stl04,h=7, method="naive")
fdp <- c("Sep. 2022","Oct. 2022","Nov. 2022","Dec. 2022","Jan. 2023","Feb. 2023","Mar. 2023",377.4679,390.9996,384.2322,369.5012,377.9487,363.6742,378.4626)
fcst03table <- matrix(fdp, nrow=7, ncol=2) %>% as.data.frame()
colnames(fcst03table) <- c("Months","Point Forecast")
pander(fcst03table, caption = "Point estimates for Sep. 2022 - Mar. 2023")
| Months | Point Forecast |
|---|---|
| Sep. 2022 | 377.4679 |
| Oct. 2022 | 390.9996 |
| Nov. 2022 | 384.2322 |
| Dec. 2022 | 369.5012 |
| Jan. 2023 | 377.9487 |
| Feb. 2023 | 363.6742 |
| Mar. 2023 | 378.4626 |
The above table shows the forecasted values for the last six observations (months) of data for the period 11-2016 through 08-2022. We chose this training set based on the results in the next section where the sample size of 69 produced the best accurracy.
Building upon the previous section, we will now compare the errors (or
accurracy) of our four sample sizes. In particular, we will use two
error measures: mean square error (MSE) and mean absolute percentage
error (MAPE).
## To compare different errors, we will not use the percentage for MAPE
PE01=(test.data-fcst01$mean)/fcst01$mean
PE02=(test.data-fcst02$mean)/fcst02$mean
PE03=(test.data-fcst03$mean)/fcst03$mean
PE04=(test.data-fcst04$mean)/fcst04$mean
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
E2=test.data-fcst02$mean
E3=test.data-fcst03$mean
E4=test.data-fcst04$mean
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.144", "n.106", "n. 69", "n. 44")
pander(accuracy, caption="Error comparison between forecast results with different sample sizes")
| MSE | MAPE | |
|---|---|---|
| n.144 | 5333 | 0.1701 |
| n.106 | 5373 | 0.1703 |
| n. 69 | 5081 | 0.1664 |
| n. 44 | 5068 | 0.1677 |
The above table reveals the error rates of each of the training sets. Training sets 3 and 4 (sample sizes n=69,44 respectively) outperformed the other two models. We will now build a graphical representation of our error rates for easier interprebility.
In the following section, we build a graphical representation of the
error terms shown in the above table. We find that while n=44 has the
lowest MSE, n=69 might be the better training set as it has the lowest
MAPE and very similar MSE to n=44.
par(mfrow=c(2,1), mar=c(3,4,3,1))
plot(seq(1,4, by=1), MSE, lwd=2,type="b", ylab="MSE", xlab="", cex=0.3,
main="Error Curves", col = "blue", xaxt='n')
axis(1, at=seq(1,4,by=1),c("n=144", "n=106","n=69", "n=44"))
plot(seq(1,4, by=1), MAPE, lwd=2,type="b", ylab="MAPE", xlab="", cex=0.3,
main="", col = "darkred", xaxt='n')
axis(1, at=seq(1,4,by=1),c("n=144", "n=106","n=69", "n=44"))
We have developed in the above report a model to forecast the rate of
job seperation based on the JOLTS survey conducted by the Bureau of
Labor Statistics (BLS). In particular, we developed a background of the
job seperations rate, gained preliminary insight on the job seperations
rate between the periods of Sep. 2010 and Mar. 2023, decomposed the
trend of the job seperations rate using classical and STL decomposition
methods, split the data into training and testing by holding out the
last six observations (months), used four different sample sizes to
train four models to forecast the test data, calculated the error rates
of the four models, and developed a graphical representation of those
error rates. Lastly, we found that, out of the four training sets, the
training set with the sample size of n=69 is the best for forecasting
future values of the job seperation rate.