Romanian Unemployment Rate Forecasting

Data and empirical results

We have used in the empirical analysis the ILO unemployment rate for Romania covering the period 2000M01-2020M12, summing up a total of 252 monthly observations. The data source is the Employment and Unemployment database of Eurostat. We used for the model estimation and identification the estimation period 2000M1-2017M12 as training data and the period 2018M01-2020M12 as test data, while the forecast projections have been made for the next two years, 2021-2022.

# Import the dataset
unemployment_rate <- read_excel("C:/Users/40726/Desktop/Analytics/Entropy/unemployment_rate.xlsx")

# Creating unemployment rate time series object called y for simplifying the code
y <- ts(unemployment_rate, start=2000, frequency = 12)

# Splitting the data intro training and test sets
training <- window(y, start=2000, end=c(2017,12))
test <- tail(y, 12*3)

# Time series plot
autoplot(y) +
  ggtitle("The evolution of monthly Romanian unemployment rate") +
  xlab("Year") +
 ylab("%")

In January-October 2020, the medium unemployment rate stood at 4.9%, up 1.1 points year/year, an evolution determined by the incidence of the health crisis (and the consequences of this unprecedented shock), partially offset by the implementation of an unprecedented relaxed mix of economic policies.

# Distribution of Unemployment rate
ggplot(y, aes(x=y)) + 
  geom_histogram(bins = 25)+
  labs(title = "Distribution of Romanian Unemployment Rate")+
  xlab("")+ylab("")

summary(y)

##        UR       
##  Min.   :3.600  
##  1st Qu.:5.775  
##  Median :6.800  
##  Mean   :6.542  
##  3rd Qu.:7.400  
##  Max.   :9.500

sd(y)

## [1] 1.260771

skewness(y)

## [1] -0.5178184

kurtosis(y)

## [1] -0.2676865

jarque.bera.test(y)

## 
##  Jarque Bera Test
## 
## data:  y
## X-squared = 12.014, df = 2, p-value = 0.002461

Series <- c("Sample","Observations","Mean","Median","Maximum","Minimum",
            "Std.Dev.","Skewness","Kurtosis","Jarque-Bera","Probability")
`Unemployment Rate` <- c("2000M01 - 2020M12",252,6.542,6.800,9.500,3.600,
            1.260,-0.517,-0.267,12.014,"0.002***")
summary_statistics <- as.data.frame(cbind(Series,`Unemployment Rate`))
summary_statistics %>% gt() %>% tab_header(
  title = md("The **Romanian Unemployment Rate** statistics for the period 2000M1-2020M12"))

The Romanian Unemployment Rate statistics for the period 2000M1-2020M12
Series	Unemployment Rate
Sample	2000M01 - 2020M12
Observations	252
Mean	6.542
Median	6.8
Maximum	9.5
Minimum	3.6
Std.Dev.	1.26
Skewness	-0.517
Kurtosis	-0.267
Jarque-Bera	12.014
Probability	0.002***

The Romanian unemployment rate exhibits seasonal fluctuations over the period 2000-2020, with peaks in the last and the first months of the year.

# Seasonality subseries
ggsubseriesplot(y) +
  ylab("%") +
  ggtitle("Seasonal subseries plot: monthly unemployment rate")

# Polar seasonal plot
ggseasonplot(y, polar=TRUE) +
  ylab("%") +
  ggtitle("Polar seasonal plot:monthly unemployment rate ")

The evolution of the monthly unemployment rate is revealing a clear seasonal component in the data, confirmed also by the autocorrelation plot.

# Autocorrelation plot
ggAcf(y, lag=48,main="Autocorrelation")

# Partial autocorrelation plot
ggPacf(y,lag=48,main ="Partial Autocorrelation")

The empirical results of Holt Winters additive and multiplicative models revealed that because both models have exactly the same number of parameters to estimate, the training RMSE from both models can be compared, revealing that the method with multiplicative seasonality fits the data best. Also, based on informational criteria (AIC, AICc or BIC), the optimal model is also the multiplicative version of HW. Table 2 gives the results of the both in-sample and out-of-sample forecasting accuracy measures of the Holt-Winters methods for the unemployment rate.

# Holt Winter Additive forecast and accuracy
hw_additive <- hw(training,seasonal="additive",h=60)
summary(hw_additive)

## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = training, h = 60, seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.7503 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 7.513 
##     b = -0.0138 
##     s = 0.1807 0.006 -0.0657 -0.3543 -0.3092 -0.3304
##            -0.2934 -0.3139 -0.0297 0.4549 0.5381 0.5169
## 
##   sigma:  0.2914
## 
##      AIC     AICc      BIC 
## 645.7898 648.8807 703.1695 
## 
## Error measures:
##                        ME      RMSE       MAE        MPE    MAPE      MASE
## Training set 0.0006147838 0.2804221 0.2108869 -0.1258617 3.06994 0.3353151
##                    ACF1
## Training set -0.1124383
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2018       5.164393 4.790919 5.537867 4.5932136 5.735572
## Feb 2018       5.171809 4.704869 5.638748 4.4576859 5.885931
## Mar 2018       5.074853 4.530239 5.619466 4.2419386 5.907767
## Apr 2018       4.576490 3.963958 5.189022 3.6397027 5.513277
## May 2018       4.278537 3.604884 4.952190 3.2482738 5.308801
## Jun 2018       4.285263 3.555577 5.014950 3.1693043 5.401223
## Jul 2018       4.234494 3.452767 5.016221 3.0389459 5.430043
## Aug 2018       4.241893 3.411367 5.072418 2.9717131 5.512072
## Sep 2018       4.183014 3.306390 5.059639 2.8423335 5.523695
## Oct 2018       4.457812 3.537384 5.378240 3.0501387 5.865485
## Nov 2018       4.515833 3.553582 5.478084 3.0441971 5.987468
## Dec 2018       4.676753 3.674412 5.679093 3.1438051 6.209700
## Jan 2019       4.999181 3.958273 6.040088 3.4072503 6.591112
## Feb 2019       5.006597 3.928501 6.084693 3.3577918 6.655402
## Mar 2019       4.909641 3.795588 6.023694 3.2058446 6.613437
## Apr 2019       4.411278 3.262384 5.560172 2.6541964 6.168359
## May 2019       4.113325 2.930607 5.296044 2.3045142 5.922137
## Jun 2019       4.120052 2.904442 5.335662 2.2609365 5.979167
## Jul 2019       4.069283 2.821639 5.316926 2.1611760 5.977389
## Aug 2019       4.076681 2.797797 5.355564 2.1207974 6.032564
## Sep 2019       4.017803 2.708417 5.327189 2.0152695 6.020336
## Oct 2019       4.292600 2.953398 5.631802 2.2444674 6.340733
## Nov 2019       4.350621 2.982245 5.718997 2.2578703 6.443372
## Dec 2019       4.511541 3.114592 5.908489 2.3750924 6.647989
## Jan 2020       4.833969 3.409006 6.258932 2.6546763 7.013262
## Feb 2020       4.841385 3.388948 6.293822 2.6200741 7.062696
## Mar 2020       4.744429 3.265021 6.223838 2.4818694 7.006989
## Apr 2020       4.246066 2.740163 5.751970 1.9429855 6.549147
## May 2020       3.948114 2.416166 5.480061 1.6052024 6.291025
## Jun 2020       3.954840 2.397277 5.512403 1.5727536 6.336926
## Jul 2020       3.904071 2.321301 5.486841 1.4834330 6.324709
## Aug 2020       3.911469 2.303881 5.519058 1.4528745 6.370064
## Sep 2020       3.852591 2.220555 5.484627 1.3566067 6.348575
## Oct 2020       4.127388 2.471259 5.783518 1.5945567 6.660220
## Nov 2020       4.185409 2.505526 5.865293 1.6162488 6.754570
## Dec 2020       4.346329 2.643017 6.049642 1.7413372 6.951321
## Jan 2021       4.668758 2.942322 6.395193 2.0284015 7.309113
## Feb 2021       4.676173 2.926920 6.425427 2.0009209 7.351426
## Mar 2021       4.579218 2.807434 6.351001 1.8695088 7.288926
## Apr 2021       4.080854 2.286818 5.874891 1.3371131 6.824596
## May 2021       3.782902 1.966880 5.598924 1.0055362 6.560268
## Jun 2021       3.789628 1.951878 5.627379 0.9790315 6.600225
## Jul 2021       3.738859 1.879628 5.598090 0.8954109 6.582307
## Aug 2021       3.746257 1.865786 5.626729 0.8703246 6.622190
## Sep 2021       3.687379 1.785899 5.588860 0.7793162 6.595442
## Oct 2021       3.962177 2.039912 5.884442 1.0223260 6.902027
## Nov 2021       4.020198 2.077365 5.963030 1.0488910 6.991504
## Dec 2021       4.181117 2.217927 6.144308 1.1786764 7.183559
## Jan 2022       4.503546 2.520196 6.486896 1.4702735 7.536818
## Feb 2022       4.510962 2.507655 6.514268 1.4471684 7.574755
## Mar 2022       4.414006 2.390934 6.437078 1.3199847 7.508027
## Apr 2022       3.915643 1.872992 5.958293 0.7916782 7.039607
## May 2022       3.617690 1.555642 5.679739 0.4640589 6.771322
## Jun 2022       3.624417 1.543145 5.705688 0.4413871 6.807446
## Jul 2022       3.573647 1.473325 5.673970 0.3614810 6.785814
## Aug 2022       3.581046 1.461838 5.700254 0.3399969 6.822095
## Sep 2022       3.522168 1.384236 5.660099 0.2524839 6.791851
## Oct 2022       3.796965 1.640468 5.953462 0.4988875 7.095043
## Nov 2022       3.854986 1.680077 6.029895 0.5287494 7.181222
## Dec 2022       4.015906 1.822734 6.209077 0.6617395 7.370072

forecast::accuracy(hw_additive,test)

##                         ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  0.0006147838 0.2804221 0.2108869 -0.1258617  3.06994 0.3353151
## Test set     -0.0371057405 0.7480342 0.6273125 -2.6100924 13.82676 0.9974416
##                    ACF1 Theil's U
## Training set -0.1124383        NA
## Test set      0.9000972  2.925882

# Holt Winter Multiplicative forecast and accuracy
hw_multiplicative <- hw(training,seasonal="multiplicative",h=60)
summary(hw_multiplicative)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = training, h = 60, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.6928 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 7.4966 
##     b = -0.0043 
##     s = 1.024 1.0002 0.9867 0.951 0.9593 0.9538
##            0.9571 0.9501 1 1.0698 1.0776 1.0705
## 
##   sigma:  0.041
## 
##      AIC     AICc      BIC 
## 630.1872 633.2781 687.5669 
## 
## Error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01236503 0.2770978 0.2086269 -0.3191448 3.036769 0.3317217
##                     ACF1
## Training set -0.05211779
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2018       5.049816 4.784447 5.315185 4.643969 5.455663
## Feb 2018       5.078370 4.753500 5.403240 4.581524 5.575215
## Mar 2018       5.036710 4.665655 5.407765 4.469230 5.604189
## Apr 2018       4.703587 4.316679 5.090496 4.111862 5.295313
## May 2018       4.464549 4.062499 4.866600 3.849666 5.079432
## Jun 2018       4.492892 4.055966 4.929819 3.824671 5.161114
## Jul 2018       4.472720 4.007708 4.937732 3.761545 5.183895
## Aug 2018       4.494046 3.998375 4.989717 3.735983 5.252109
## Sep 2018       4.450701 3.933093 4.968309 3.659088 5.242314
## Oct 2018       4.613224 4.050301 5.176146 3.752308 5.474139
## Nov 2018       4.671785 4.076089 5.267481 3.760747 5.582823
## Dec 2018       4.778314 4.143827 5.412801 3.807950 5.748678
## Jan 2019       4.990085 4.302073 5.678098 3.937861 6.042309
## Feb 2019       5.018242 4.301641 5.734843 3.922295 6.114188
## Mar 2019       4.977017 4.242542 5.711491 3.853735 6.100298
## Apr 2019       4.647787 3.940338 5.355235 3.565838 5.729736
## May 2019       4.411532 3.720124 5.102940 3.354114 5.468950
## Jun 2019       4.439486 3.724150 5.154822 3.345474 5.533498
## Jul 2019       4.419500 3.688385 5.150616 3.301355 5.537645
## Aug 2019       4.440520 3.687262 5.193778 3.288511 5.592529
## Sep 2019       4.397638 3.633559 5.161717 3.229080 5.566196
## Oct 2019       4.558169 3.747819 5.368519 3.318845 5.797492
## Nov 2019       4.615976 3.777092 5.454859 3.333013 5.898938
## Dec 2019       4.721176 3.844840 5.597511 3.380936 6.061415
## Jan 2020       4.930354 3.996372 5.864337 3.501952 6.358757
## Feb 2020       4.958114 4.000270 5.915959 3.493217 6.423011
## Mar 2020       4.917323 3.949202 5.885444 3.436710 6.397937
## Apr 2020       4.591987 3.671221 5.512752 3.183797 6.000176
## May 2020       4.358515 3.468949 5.248081 2.998042 5.718988
## Jun 2020       4.386079 3.475396 5.296763 2.993309 5.778849
## Jul 2020       4.366281 3.444496 5.288066 2.956532 5.776030
## Aug 2020       4.386994 3.445751 5.328236 2.947487 5.826500
## Sep 2020       4.344576 3.397683 5.291468 2.896429 5.792723
## Oct 2020       4.503114 3.506566 5.499662 2.979025 6.027203
## Nov 2020       4.560166 3.535878 5.584455 2.993652 6.126680
## Dec 2020       4.664037 3.601134 5.726939 3.038468 6.289605
## Jan 2021       4.870624 3.744852 5.996396 3.148904 6.592343
## Feb 2021       4.897986 3.750186 6.045787 3.142577 6.653396
## Mar 2021       4.857630 3.703872 6.011388 3.093110 6.622150
## Apr 2021       4.536186 3.444525 5.627847 2.866635 6.205737
## May 2021       4.305498 3.255959 5.355037 2.700366 5.910630
## Jun 2021       4.332673 3.263156 5.402190 2.696988 5.968358
## Jul 2021       4.313062 3.235214 5.390909 2.664637 5.961487
## Aug 2021       4.333467 3.237404 5.429531 2.657183 6.009752
## Sep 2021       4.291513 3.193183 5.389843 2.611763 5.971264
## Oct 2021       4.448059 3.296428 5.599690 2.686791 6.209326
## Nov 2021       4.504357 3.324855 5.683858 2.700465 6.308249
## Dec 2021       4.606898 3.387055 5.826741 2.741309 6.472487
## Jan 2022       4.810893 3.523051 6.098735 2.841309 6.780477
## Feb 2022       4.837859 3.528846 6.146871 2.835897 6.839821
## Mar 2022       4.797936 3.485990 6.109883 2.791487 6.804386
## Apr 2022       4.480386 3.242534 5.718238 2.587255 6.373517
## May 2022       4.252481 3.065590 5.439371 2.437289 6.067673
## Jun 2022       4.279266 3.072901 5.485632 2.434290 6.124243
## Jul 2022       4.259842 3.047087 5.472598 2.405093 6.114592
## Aug 2022       4.279941 3.049618 5.510264 2.398325 6.161558
## Sep 2022       4.238450 3.008397 5.468504 2.357246 6.119655
## Oct 2022       4.393004 3.106088 5.679919 2.424836 6.361171
## Nov 2022       4.448547 3.133272 5.763823 2.437007 6.460087
## Dec 2022       4.549759 3.192267 5.907251 2.473655 6.625863

forecast::accuracy(hw_multiplicative,test)

##                       ME      RMSE       MAE        MPE      MAPE      MASE
## Training set -0.01236503 0.2770978 0.2086269 -0.3191448  3.036769 0.3317217
## Test set     -0.26698280 0.6905974 0.6524412 -7.8322030 15.139328 1.0373968
##                     ACF1 Theil's U
## Training set -0.05211779        NA
## Test set      0.90345688  2.968671

The empirical results of Holt Winters for the forecast of unemployment rate

`Model 1: Holt-Winters multiplicative method` <- c("Smoothing parameters:",
                                                    "Alpha(level) = 0.6928",
                                                    "Beta(trend)  = 0.0001",
                                                    "Gamma(seasonal) = 0.0001",
                                                   "AIC= 630.187",
                                                   "AICc= 633.278",
                                                   "BIC= 687.566")
`Model 2: Holt-Winters additive method` <-c("Smoothing parameters:",
                                            "Alpha(level) = 0.7503",
                                            "Beta(trend)  = 0.0001",
                                            "Gamma(seasonal) = 0.0001",
                                            "AIC= 645.789",
                                            "AICc= 648.8807",
                                            "BIC= 703.169")
empirical_results_HW <- as.data.frame(cbind(`Model 1: Holt-Winters multiplicative method`,`Model 2: Holt-Winters additive method`))
empirical_results_HW %>% gt() %>% tab_header(
  title = md("**The empirical results of HW for the forecast of unemployment rate**"))

The empirical results of HW for the forecast of unemployment rate
Model 1: Holt-Winters multiplicative method	Model 2: Holt-Winters additive method
Smoothing parameters:	Smoothing parameters:
Alpha(level) = 0.6928	Alpha(level) = 0.7503
Beta(trend) = 0.0001	Beta(trend) = 0.0001
Gamma(seasonal) = 0.0001	Gamma(seasonal) = 0.0001
AIC= 630.187	AIC= 645.789
AICc= 633.278	AICc= 648.8807
BIC= 687.566	BIC= 703.169

According to RMSE measure, the multiplicative model performs better than the additive one, while based on the other forecast accuracy measures (MAPE, MASE or MAE), the optimal model is the additive one, for which they registered the minimum values.

Metrics <- c("ME","RMSE","MAE","MPE","MAPE","MASE")
`HW multiplicative training` <- c(-0.0123,0.2770,0.2086,-0.3191,3.0367,0.3317)
`HW multiplicative test` <- c(-0.2669,0.6905,0.6524,-7.8322,15.1393,1.0373)
`HW additive training` <- c(0.006,0.2804,0.2108,-0.1258,3.0699,0.3353)
`HW additive test` <- c(-0.03710,0.7480,0.6273,-2.6100,13.8267,0.9974)

forecasting_performance_hw <- as.data.frame(cbind(Metrics,
                                                  `HW multiplicative training`,
                                                  `HW multiplicative test`,
                                                  `HW additive training`,
                                                  `HW additive test`))

  
forecasting_performance_hw %>% gt() %>% tab_header(
  title = md("**Forecasting Performance of Holt-Winters**"))

Forecasting Performance of Holt-Winters
Metrics	HW multiplicative training	HW multiplicative test	HW additive training	HW additive test
ME	-0.0123	-0.2669	0.006	-0.0371
RMSE	0.277	0.6905	0.2804	0.748
MAE	0.2086	0.6524	0.2108	0.6273
MPE	-0.3191	-7.8322	-0.1258	-2.61
MAPE	3.0367	15.1393	3.0699	13.8267
MASE	0.3317	1.0373	0.3353	0.9974

Analyzing the evolution of monthly unemployment rate for the period 2021-2022, it can be highlighted the fact that the forecast projections tend to under evaluate the actual series, not capturing the impact of the pandemics, and revealing a downward trend in both cases, more accentuated in the case of the multiplicative model.

# Plot of both hw models
autoplot(y) +
  autolayer(hw_additive, series="HW additive forecasts", PI=FALSE) +
  autolayer(hw_multiplicative, series="HW multiplicative forecasts",
            PI=FALSE) +
  xlab("Time") +
  ylab("Monthly unemployment rate(%)") +
  ggtitle("The forecast of unemployment rate based on HW models for the period 2021-2022") +
  guides(colour=guide_legend(title="Forecast"))

In the process of obtaining reliable forecast of the monthly unemployment rate, ETS automatic selection framework, based on minimizing the AICc revealed the optimal model to be an ETS (M, N, M) with multiplicative error, no trend and multiplicative season. The empirical results highlighted that on the training data set, the ETS model produces better results in comparison with HW additive or multiplicative methods. The ETS(M,N,M) model will provide different point forecasts to the multiplicative Holt-Winters’ method, because the parameters have been estimated differently, the default estimation method being maximum likelihood rather than minimum sum of squares.

# Fitting ets model
fit.ets_training<-ets(training)
#summary(fit.ets_training)

# ETS table preparation
`ETS(M, N, M) model: Multi-plicative Error, No trend, Multiplicative Season`<-
  c("Smoothing parameters:","Alpha(level) = 0.7914","Gamma(seasonal) = 0.0001",
    "AIC= 627.799","AICc= 630.199","BIC= 678.428")

empirical_results_ETS <- as.data.frame(`ETS(M, N, M) model: Multi-plicative Error, No trend, Multiplicative Season`)
empirical_results_ETS %>% gt() %>% tab_header(
  title = md("**The empirical results of ETS models for the forecast of unemployment rate**"))

The empirical results of ETS models for the forecast of unemployment rate
ETS(M, N, M) model: Multi-plicative Error, No trend, Multiplicative Season
Smoothing parameters:
Alpha(level) = 0.7914
Gamma(seasonal) = 0.0001
AIC= 627.799
AICc= 630.199
BIC= 678.428

Forecasting Performance of ETS model

# Forecasting ETS
fit.ets_training %>% forecast::forecast(h = 60) %>%
  forecast::accuracy()

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01660129 0.2788063 0.2097566 -0.3682358 3.056985 0.3335179
##                    ACF1
## Training set -0.1629932

# Forecasting table preparation
Metrics <- c("ME","RMSE","MAE","MPE","MAPE","MASE")
`ETS training` <- c(-0.0166,0.2788,0.2097,-0.3682,3.0569,0.3335)
forecasting_performance_ets <- as.data.frame(cbind(Metrics,
                                                  `ETS training`))

forecasting_performance_ets %>% gt() %>% tab_header(
  title = md("**Forecasting Performance of ETS model**"))

Forecasting Performance of ETS model
Metrics	ETS training
ME	-0.0166
RMSE	0.2788
MAE	0.2097
MPE	-0.3682
MAPE	3.0569
MASE	0.3335

The plot of ETS (M, N, M) components displays the states over time while the second plot shows point forecasts and prediction intervals generated from the model. The empirical results of the model pointed out an under evaluation of the real values during the period of test data set from 2018 to 2020, highlighting an oscillate evolution characterized by a strong seasonal pattern also for the forecast projections period, 2021-2022.

# Components plot
autoplot(fit.ets_training)

# Checking the residuals
checkresiduals(fit.ets_training)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,M)
## Q* = 23.828, df = 10, p-value = 0.008071
## 
## Model df: 14.   Total lags used: 24

The forecast of unemployment rate based on the results of ETS (M, N, M)

# Forecast plot ETS with funchart
training %>% ets() %>% forecast::forecast(h=60) %>% autoplot()

In order to fit the NNAR model, the series of unemployment rate has been explored on the training data set in the process of identifying the order of an AR term present in the data, using the correlogram of the series. Based on ACF and PACF plot, a pure AR(1) process can be highlighted for the non-seasonal part. Also analyzing the ACF plot, the decaying spikes every 12 months interval indicates a seasonal component present in the data. As the autocorrelation at the seasonal period (ACF at lag 12) is positive, an autoregressive model for seasonal part should be considered, therefore the order P was set to 1. Therefore, a NNAR(1,1,k)12 model is fitted and the in-sample and out-sample root mean square error (RMSE), mean absolute error(MAE), mean absolute scale error (MASE) and mean absolute percentage error (MAPE) are provided in table where k =1, …, 14.

# Fitting the NNAR
UR_NNAR <- nnetar(training,p=1,P=1,size=10, lambda=0)

# Forecasting NNAR
fcast <- forecast::forecast(UR_NNAR, PI=TRUE, h=60)

fit_nnar_training <- forecast::forecast(UR_NNAR, PI=TRUE, h=24)

# K fold cross validation for training set for NNAR
# forecast::accuracy(fit_nnar_training, test) # k=1
# modelcv <- CVar(training, k=2)
# print(modelcv)
# modelcv <- CVar(training, k=3)
# print(modelcv)
# modelcv <- CVar(training, k=4)
# print(modelcv)
# modelcv <- CVar(training, k=5)
# print(modelcv)
# modelcv <- CVar(training, k=6)
# print(modelcv)
# modelcv <- CVar(training, k=7)
# print(modelcv)
# modelcv <- CVar(training, k=8)
# print(modelcv)
# modelcv <- CVar(training, k=9)
# print(modelcv)
# modelcv <- CVar(training, k=10)
# print(modelcv)
# # K fold cross validation for test set for NNAR
# modelcv <- CVar(test, k=2)
# print(modelcv)
# modelcv <- CVar(test, k=3)
# print(modelcv)
# modelcv <- CVar(test, k=4)
# print(modelcv)
# modelcv <- CVar(test, k=5)
# print(modelcv)
# modelcv <- CVar(test, k=6)
# print(modelcv)
# modelcv <- CVar(test, k=7)
# print(modelcv)
# modelcv <- CVar(test, k=8)
# print(modelcv)
# modelcv <- CVar(test, k=9)
# print(modelcv)
# modelcv <- CVar(test, k=10)
# print(modelcv)

# Output preparation

k <- c("RMSE Training","MAE Training","MAPE Training","MASE Training","RMSE Test","MAE Test"    ,"MAPE Test","MASE Test")
`1` <- c( 0.3570,   0.2734, 3.9654, 0.4348, 0.6792, 0.6399, 16.2143,    1.0174)
`2` <- c(   0.3477, 0.2662, 3.8562, 0.4233, 0.9019, 0.8542, 21.6274,    1.3582)
`3` <- c(   0.3402, 0.2604, 3.7626, 0.4141, 0.8510, 0.8044, 20.3754,    1.2790)
`4` <- c(   0.3329, 0.2553, 3.6772, 0.4059, 2.0452, 1.8630, 47.2547,    2.9622)
`5` <- c(   0.3297, 0.2524, 3.6264, 0.4013, 1.6242, 1.4196, 36.1478, 2.2572)
`6` <- c(   0.3228, 0.2464, 3.5341, 0.3918, 0.7710, 0.7208, 18.2993,    1.1461)
`7` <- c(   0.3195, 0.2443, 3.5057, 0.3884, 0.7739, 0.7221, 18.3387,    1.1482)
`8` <- c(   0.3173, 0.2421, 3.4737, 0.3850, 0.8042, 0.7518, 19.0849,    1.1954)
`9` <- c(   0.3167, 0.2421, 3.4681, 0.3850, 0.7873, 0.7356, 18.6744,    1.1696)
`10` <- c(  0.3150, 0.2411, 3.4513, 0.3834, 0.5979, 0.5508, 14.0168,    0.8758)
`11` <- c(  0.3087, 0.2362, 3.3860, 0.3757, 0.6936, 0.6450, 16.3913,    1.0256)
`12` <- c(  0.3033, 0.2329, 3.3456, 0.3704, 0.6220, 0.5747, 14.6184,    0.9139)
`13` <- c(  0.3058, 0.2339, 3.3533, 0.3719, 0.7008, 0.6510, 16.5462,    1.0351)
`14` <- c(  0.3064, 0.2357, 3.3779, 0.3749, 0.6944, 0.6452, 16.4001,    1.0260)

cross_validation <- as.data.frame(cbind(k,`1`,`2`,`3`,`4`,`5`,`6`,`7`,`8`,`9`,`10`,`11`,`12`,`13`,`14`))


cross_validation %>% gt() %>% tab_header(
  title = md("**Forecasting Performance of NNAR(1,1,k)12**"),
  subtitle = "Training and Test data sets")

Forecasting Performance of NNAR(1,1,k)12
Training and Test data sets
k	1	2	3	4	5	6	7	8	9	10	11	12	13	14
RMSE Training	0.357	0.3477	0.3402	0.3329	0.3297	0.3228	0.3195	0.3173	0.3167	0.315	0.3087	0.3033	0.3058	0.3064
MAE Training	0.2734	0.2662	0.2604	0.2553	0.2524	0.2464	0.2443	0.2421	0.2421	0.2411	0.2362	0.2329	0.2339	0.2357
MAPE Training	3.9654	3.8562	3.7626	3.6772	3.6264	3.5341	3.5057	3.4737	3.4681	3.4513	3.386	3.3456	3.3533	3.3779
MASE Training	0.4348	0.4233	0.4141	0.4059	0.4013	0.3918	0.3884	0.385	0.385	0.3834	0.3757	0.3704	0.3719	0.3749
RMSE Test	0.6792	0.9019	0.851	2.0452	1.6242	0.771	0.7739	0.8042	0.7873	0.5979	0.6936	0.622	0.7008	0.6944
MAE Test	0.6399	0.8542	0.8044	1.863	1.4196	0.7208	0.7221	0.7518	0.7356	0.5508	0.645	0.5747	0.651	0.6452
MAPE Test	16.2143	21.6274	20.3754	47.2547	36.1478	18.2993	18.3387	19.0849	18.6744	14.0168	16.3913	14.6184	16.5462	16.4001
MASE Test	1.0174	1.3582	1.279	2.9622	2.2572	1.1461	1.1482	1.1954	1.1696	0.8758	1.0256	0.9139	1.0351	1.026

The selection of the best model relied on the lowest values of all forecast accuracy measures (RMSE, MAE, MAPE and MASE), but especially on the values of MAPE and MASE which are scale independent, used to compare forecast accuracy across series on different scales and seen as an appropriate measure when the out-of-sample data is not of same length as the in-sample data. Based on the results of above, MASE and MAPE are lower for the training data set with 12 nodes in the hidden layer, whereas the out-of-sample MASE and MAPE are lower for 10 nodes in the hidden layer. There-fore, we can consider as the best choice the model NNAR (1,1,10)12. The forecast of the unemployment rate based on the NNAR(1,1,10)12 model results revealed a downward trend with a peak in September 2018 (4.43%) and with a forecasting value for 2021-2022 oscillating around the value of 4.35%.

Forecasts from a neural network with one seasonal and non-seasonal lagged input and one hidden layer containing ten neurons.

# NNAR forecasting
autoplot(fcast)

For fitting a SARIMA model, we used data covering the period January 2000 to December 2017.The series exhibited a strong seasonal pattern over the horizon 2000-2017.
In order to fit a suitable time series model, the stationarity need to be investigated based on Augmented Dickey-Fuller and Philips-Perron tests. The graphical inspection of the autocorrelation and Partial Correlation Plot of Romania’s quarterly unemployment rate revealed that the values of autocorrelation coefficients decrease slowly, pointing out a non-stationary and relatively stable seasonal pattern of our time series. Also the time series plot of the first difference of the series highlighted that the unemployment rate is a non-stationary mean time series. The information is also con-firmed by the empirical results of Bartlett and Ljung-Box tests. Also the time series plot of the first difference of the series highlighted that the first difference of the unemployment rate seems a stationary mean time series. Therefore, the original quarterly series is a non-stationary time series.

# Display the first lag difference in order to help to identify the SARIMA
training%>% diff(lag=1) %>% ggtsdisplay()

The diagram series plot indicate that possible stationarity exists in first differences. Alternately, we investigated the presence of unit roots by applying the Augmented Dickey-Fuller and Phillips-Peron tests initially to the series in level and then to the series in first differences. The empirical results on unemployment rate are displayed below, indicating that the series of unemployment rate is stationary in first differences, being integrated of order 1.

# Create the new lagged 1 difference series
training_sarima <- training%>% diff(lag=1) 

# Checking ADF
# summary(ur.df(training, type = c("none"), lags = 1))
# summary(ur.df(training_sarima, type = c("none"), lags = 1))
# 
# summary(ur.df(training, type = c("drift"), lags = 1))
# summary(ur.df(training_sarima, type = c("drift"), lags = 1))
# 
# summary(ur.df(training, type = c("trend"), lags = 1))
# summary(ur.df(training_sarima, type = c("trend"), lags = 1))

# Checking Philip-Peron
# summary(ur.pp(training,type = c("Z-alpha"), model = c("constant", "trend")))
# summary(ur.pp(training_sarima,type = c("Z-alpha"), model = c("constant", "trend")))
# 
# summary(ur.pp(training,type = c("Z-alpha"), model = c("constant")))
# summary(ur.pp(training_sarima,type = c("Z-alpha"), model = c("constant")))
# 
# summary(ur.pp(training,type = c("Z-alpha"), model = c("trend")))
# summary(ur.pp(training_sarima,type = c("Z-alpha"), model = c("trend")))

`Unit Root` <- c("ADF level","PP level","ADF first difference","PP first difference")
`T&C` <- c("-3.56**","-3.52**","-15.87***","-16.20***")
`C` <- c("-2.58*",  "-2.72*",       "-15.90***",    "-16.01***")
`None` <- c("-0.90",    "-0.98",        "-15.91***",    "-16.01***")

unit_root <- as.data.frame(cbind(`Unit Root`,`T&C`,`C`,`None`))

unit_root %>% gt() %>% tab_header(
  title = md("**Unit root analysis of the Romanian unemployment rate**")) %>%
  tab_source_note(
    source_note = "Note: ***, **, * means stationary at 1%, 5% and 10%; T&C represents the most general model with a constant and trend; C is the model with a constant and without trend; None is the most restricted model without a drift and trend"
  )

Unit root analysis of the Romanian unemployment rate
Unit Root	T&C	C	None
ADF level	-3.56**	-2.58*	-0.90
PP level	-3.52**	-2.72*	-0.98
ADF first difference	-15.87***	-15.90***	-15.91***
PP first difference	-16.20***	-16.01***	-16.01***
Note: *, , * means stationary at 1%, 5% and 10%; T&C represents the most general model with a constant and trend; C is the model with a constant and without trend; None is the most restricted model without a drift and trend

The next step was to test the presence of a structural break around 2009, taking into account that the presence of a structural break will invalidate the results of unit root tests. Therefore, Zivot-Andrews test has been used, the empirical result revealing that there is no enough evidence to reject both null hypothesis of unemployment has a unit root with structural break in trend, and in both intercept and trend.

# Structural breaks - Zivot Andrew test
za_trend <- ur.za(training, model =  "trend")
za_both <- ur.za(training, model =  "both")
#summary(za_trend)
#summary(za_both)

# Creating the table for ZA test
row1 <- c("Statistics","Allowing for break in trend","Allowing for break in both intercept and trend")
row2 <- c("Minimum t-stat p-value", "-4.139 (0.000)",   "-4.501 (0.000)")
row3 <- c("1%", "-4.93",    "-5.57")
row4 <- c("5%", "-4.42",    "-5.08")
row5 <- c("10%",    "-4.11",    "-4.82")
row6 <- c("Potential break point","2015M08","2009M06")

zivot_andrew <- as.data.frame(rbind(row1,row2,row3,row4,row5,row6))

zivot_andrew %>% gt() %>% tab_header(
  title = md("**Zivot-Andrews unit root test having structural break for unemployment rate**"))

Zivot-Andrews unit root test having structural break for unemployment rate
V1	V2	V3
Statistics	Allowing for break in trend	Allowing for break in both intercept and trend
Minimum t-stat p-value	-4.139 (0.000)	-4.501 (0.000)
1%	-4.93	-5.57
5%	-4.42	-5.08
10%	-4.11	-4.82
Potential break point	2015M08	2009M06

Thus, the empirical results proved that the unemployment rate is non-stationary and integrated of order 1, I(1).
However, because the series of unemployment exhibits a seasonal pattern over the training period, the study will use a seasonal ARIMA model instead of non-seasonal models; therefore, it is necessary to check whether the seasonality is needed to be differenced or not, testing if the stochastic seasonality is present within the data, the empirical results of Hegy test revealing the rejection of seasonal unit root and the acceptance of only a non-seasonal unit root. Therefore, seasonal difference is not needed.

# Hegy Test
hegy.test(training)

## 
##  HEGY test for unit roots
## 
## data:  training
## 
##         statistic p-value    
## t_1       -0.2479  0.9208    
## t_2       -5.2363       0 ***
## F_3:4      7.8026   5e-04 ***
## F_5:6     16.0819       0 ***
## F_7:8     17.8412       0 ***
## F_9:10    25.1696       0 ***
## F_11:12   17.0693       0 ***
## F_2:12    64.8717       0 ***
## F_1:12    61.2199       0 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Deterministic terms: constant 
## Lag selection criterion and order: fixed, 0
## P-values: based on response surface regressions

Therefore, we can conclude that the unemployment rate is a non-stationary series, without stochastic seasonality and integrated of order 1. Thus, the rate of unemployment will be modeled at the first difference of the series within SARIMA model

For the first difference of the UR, the model identification implies the identification of proper values of p, P, q, Q using the ACF and PACF plot. The seasonal part of an AR or MA model will be seen in the seasonal lags. The ACF plot has a spike at lags 4 and 6 and an exponential decay starting from seasonal lag 12, suggesting a potential non-seasonal MA component-MA(4) or MA(6)
Besides, the PACF plot shows that lags 4, 6 and 12 are significant, capturing also potential non-seasonal AR components together with a seasonal AR(1). In our case, because the autocorrelation at the seasonal lags (12, 24) is positive, a combination of seasonal and non-seasonal autoregressive models can be identified. Thus, several models have been specified and based on AIC and BIC together with the goodness of fit measures, the best model has been identified.
Thus, several models have been specified and based on AIC and BIC together with the goodness of fit measures, the best model has been identified, taking into account the lowest values of AIC and SBC. The best model has been an ARIMA(0,1,6)(1,0,1)12 considered based on the minimum value of AIC and BIC.

# Fitting the optimal SARIMA model
fit_sarima <- Arima(training, order=c(0,1,6), seasonal=c(1,0,1))

# Coefficients test
coeftest(fit_sarima)

## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ma1  -0.334617   0.069856  -4.7901 1.667e-06 ***
## ma2   0.163104   0.072163   2.2602   0.02381 *  
## ma3   0.070413   0.075991   0.9266   0.35414    
## ma4  -0.097397   0.077389  -1.2585   0.20820    
## ma5  -0.011470   0.072591  -0.1580   0.87445    
## ma6  -0.123155   0.069532  -1.7712   0.07653 .  
## sar1  0.983605   0.015399  63.8766 < 2.2e-16 ***
## sma1 -0.846196   0.066935 -12.6421 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Summary of results
summary(fit_sarima)

## Series: training 
## ARIMA(0,1,6)(1,0,1)[12] 
## 
## Coefficients:
##           ma1     ma2     ma3      ma4      ma5      ma6    sar1     sma1
##       -0.3346  0.1631  0.0704  -0.0974  -0.0115  -0.1232  0.9836  -0.8462
## s.e.   0.0699  0.0722  0.0760   0.0774   0.0726   0.0695  0.0154   0.0669
## 
## sigma^2 estimated as 0.08692:  log likelihood=-45.21
## AIC=108.42   AICc=109.3   BIC=138.76
## 
## Training set error measures:
##                      ME      RMSE       MAE        MPE    MAPE      MASE
## Training set -0.0170657 0.2886134 0.2216369 -0.3767238 3.20478 0.3524078
##                     ACF1
## Training set -0.01044607

Apart of classical tests, t-test for the statistical significance of the parameters and F-test for the validity of the model, the selection of the best model depends also on the performance of residuals. For that, the series of residuals has been investigated to fol-low a white noise. The empirical results of Ljung-Box test shows that the p-values of the test statistic exceed the 5% level of significance for all lag orders which implies that there is no significant autocorrelation in residuals.

# Check residuals plot
checkresiduals(fit_sarima, lag=48)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,6)(1,0,1)[12]
## Q* = 40.434, df = 40, p-value = 0.4511
## 
## Model df: 8.   Total lags used: 48

For checking autoregressive conditional heteroskedasticity (ARCH) in the residuals the ARCH-LM test has been used, the empirical results confirmed that there is no autoregressive conditional heteroscedasticity (ARCH) in the residuals (table 11). Therefore, we can conclude that residuals are not autocorrelated and don’t form ARCH models, the SARIMA (0,1,6)(1,0,1)12 model being reliable for forecasting.

# Ljung Box test for residuals
# Box.test(fit_sarima$residuals, lag = 12, type = c("Ljung-Box"))
# Box.test(fit_sarima$residuals, lag = 24, type = c("Ljung-Box"))
# Box.test(fit_sarima$residuals, lag = 36, type = c("Ljung-Box"))
# Box.test(fit_sarima$residuals, lag = 48, type = c("Ljung-Box"))

# Arch LM test for residuals
# ArchTest(fit_sarima$residuals, lags=12)
# ArchTest(fit_sarima$residuals, lags=24)
# ArchTest(fit_sarima$residuals, lags=36)
# ArchTest(fit_sarima$residuals, lags=48)


# Ljung Box and ARCH LM table
Lags <- c("Ljung-Box test","P-value","ARCH-LM test","P-value")
`12` <- c("2.9459", "0.9959",   "9.1184",   "0.6928")
`24` <- c("15.123", "0.9171",   "44.267",   "0.2345")
`36` <- c("25.531", "0.9029",   "51.336",   "0.1878")
`48` <- c("40.434", "0.7727",   "58.159",   "0.1495")

lb_archlm <- as.data.frame(cbind(Lags,`12`,`24`,`36`,`48`))


lb_archlm %>% gt() %>% tab_header(
  title = md("**Empirical results of Ljung-Box test and ARCH-LM test for model residuals**"))

Empirical results of Ljung-Box test and ARCH-LM test for model residuals
Lags	12	24	36	48
Ljung-Box test	2.9459	15.123	25.531	40.434
P-value	0.9959	0.9171	0.9029	0.7727
ARCH-LM test	9.1184	44.267	51.336	58.159
P-value	0.6928	0.2345	0.1878	0.1495

Forecasting Performance of SARIMA(0,1,6)(1,0,1)x12

# Forecast sarima
fit_sarima_accuracy<- fit_sarima %>% forecast::forecast(h=60)

# Check the accuracy
forecast::accuracy(fit_sarima_accuracy, test)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01706570 0.2886134 0.2216369 -0.3767238  3.20478 0.3524078
## Test set      0.06253783 0.7640920 0.6153427 -0.4364118 13.37031 0.9784094
##                     ACF1 Theil's U
## Training set -0.01044607        NA
## Test set      0.91605107   2.91675

The forecast of the unemployment rate based on the ARIMA(0,1,6)(1,0,1)x12 model results revealed a downward trend with a forecasting value for 2021-2022 oscillating around the value of 3-4%

# Forecast plot
fit_sarima_accuracy %>% forecast::forecast(h=60) %>%autoplot()

Romanian Unemployment Rate Forecasting

Eduard Manta, Adriana Davidescu

27/05/2021

Comparative analysis of different univariate forecasting methods in modelling and predicting the Romanian unemployment rate for the period 2021-2022

Data and empirical results