Sasidhar Maddipatla
December 5th 2021
what is hard data and soft data?
“Hard” data is obtained from government statistical agencies and other sources and examined and scrutinized for insights into the broad economy. Other type of data is derived from surveys of businesses, consumer confidence and sentiment surveys - and is used similarly: to infer future business outcomes and performance.
In our business scenario, does adding soft data like Volatility index and uncertainty index improve the univariate(employment) forecast?
Monthly Employment data (hard data)
Monthly Fear or volatility index(soft data)
Monthly uncertainty Index(soft data)
#Employment Data
empl = pdfetch::pdfetch_FRED("PAYEMS") #Monthly and seasonally adjusted data.
# Fear Index data
vix = pdfetch::pdfetch_FRED("VIXCLS") # the fear index
vix.m = xts::to.monthly(vix, indexAt = "yearmon", drop.time = TRUE)
vix.m = vix.m[,4]
#Uncertainty index
uix = pdfetch::pdfetch_FRED("USEPUINDXD") # the uncertainty index
uix.m = xts::to.monthly(uix, indexAt = "yearmon", drop.time = TRUE)
uix.m = uix.m[,4]
#Convert to TS
empl.ts = ts(empl, start = c(1939,1), end = c(2021, 11), frequency = 12)
vix.m.ts = ts(vix.m, start = c(1990,1), end = c(2021, 12), frequency = 12)
uix.m.ts = ts(uix.m, start = c(1985,1), end = c(2021, 12), frequency = 12)
#Adjust the time-periods to match the time-periods of uncertainty and volatility variables.
empl.ts = window(empl.ts, start = c(1990,1), end = c(2021, 11), frequency = 12)
vix.m.ts = window(vix.m.ts, start = c(1990,1), end = c(2021, 11), frequency = 12)
uix.m.ts = window(uix.m.ts, start = c(1990,1), end = c(2021, 11), frequency = 12)
#employment data split
emp_split=ts_split(empl.ts)
length(emp_split$train);length(emp_split$test)
start(emp_split$train); end(emp_split$train)
start(emp_split$test); end(emp_split$test)
# uncertainty and volatility variables split
vix_split=ts_split(vix.m.ts)
length(vix_split$train);length(emp_split$test)
start(vix_split$train); end(vix_split$train)
start(vix_split$test); end(vix_split$test)
usep_split=ts_split(uix.m.ts)
length(usep_split$train);length(usep_split$test)
start(usep_split$train); end(usep_split$train)
start(usep_split$test); end(usep_split$test)
model_arima = auto.arima(emp_split$train)
accuracy(model_arima)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -1.34 123 93.8 -0.000791 0.0744 0.0435 -0.0101
arima_fc = forecast(model_arima, h = length(emp_split$test))
autoplot(arima_fc)+autolayer(emp_split$test)
mod_data_train = cbind(vix_split$train, usep_split$train)
mod_data_test = cbind(vix_split$test, usep_split$test)
model_hybrid_x= forecastHybrid::hybridModel(emp_split$train,
a.args=list(xreg=mod_data_train),
n.args=list(xreg=mod_data_train),
models='ant')
accuracy(model_hybrid_x)
ME RMSE MAE MPE MAPE ACF1 Theil's U
Test set 6.88 119 92.8 0.0059 0.0732 0.0259 0.478
modelhybrid_fc = forecast(model_hybrid_x, h = length(emp_split$test),xreg=mod_data_train)
autoplot(modelhybrid_fc)+autolayer(emp_split$test)
autoplot(empl.ts, col = "darkred") +
autolayer(modelhybrid_fc) +autolayer(emp_split$test)
autoplot(empl.ts) +
autolayer(modelhybrid_fc$mean)+
autolayer(emp_split$test)
model_nnetar = nnetar(emp_split$train)
accuracy(model_nnetar)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.231 143 107 0.0000345 0.0841 0.0495 0.438
nnetar_fc = forecast(model_nnetar, h = length(emp_split$test))
autoplot(empl.ts, col = "darkred") +
autolayer(nnetar_fc)+autolayer(emp_split$test)
ndiffs(empl.ts)
[1] 1
empl1.ts=empl.ts%>% diff()
emp_split1=ts_split(empl1.ts)
model_nnetar = nnetar(emp_split1$train)
accuracy(model_nnetar)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.0767 3.58 1.65 -0.44 4.35 0.00856 -0.254
nnetar_fc = forecast(model_nnetar, h = length(emp_split1$test))
autoplot(empl1.ts, col = "darkred") +
autolayer(nnetar_fc)+autolayer(emp_split1$test)
model_arima = auto.arima(emp_split$train)
accuracy(model_arima)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -1.34 123 93.8 -0.000791 0.0744 0.0435 -0.0101
arima_fc = forecast(model_arima, h = length(emp_split$test))
autoplot(arima_fc)+autolayer(emp_split$test)
model_arimax=auto.arima(emp_split$train,
xreg=mod_data_train)
accuracy(model_arimax)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 10.1 122 93.9 0.00861 0.0745 0.0436 -0.00932
arima_fc = forecast(model_arimax, h = length(emp_split$test),xreg=mod_data_train)
autoplot(arima_fc)+autolayer(emp_split$test)
-RMSE :119 for ensemble method is the best forecast model when added with regressors. -But the impact of soft data (volatility and uncertainty index) is very little to zero improvement in the hard data forecast(employment)
-NNETAR without regressors seems to be model with highest RMSE(145), completely ineffective.
-ARIMA without regressors and ARIMAX with regressors has very little difference in RMSE(123 and 122), -This shows that adding regressors doesn't add any value to the hard data forecast and these two models are not performing better than the ensemble model.
-Striking point:
-NNETAR is not performing good when there is trend or seasonality and even when data is non-stationary -However, When the data is de-trended in our case and made the data to be stationary. -NNETAR model suits the best with the employment data and RMSE is (3.93) -which is better than all the models even the ensemble method.