https://www.kaggle.com/tunguz/us-monthly-unemployment-rate-1948-present
The Business objectives of this project is to forecast US unemployment rate or number of people that do not have jobs or unemployed in the near future. The projection will be developed with Auto ARIMA to speed up process and as one of time series popular techniques to process financial or economic data.
Set up Libraries
library(tidyverse)
library(lubridate)
library(zoo)
library(dplyr)
library(forecast)
library(TTR)
library(fpp)
library(xts)
library(autoplotly)Read the Data
Data Transformation | Pivot Longer for month
Data Transformation | Convert month to numbers
Data Transformation | Convert Year data type character to numeric
Data Transformation | Convert Year and Month to one column
Data Transformation | Remove other redundant variables
Check missing values
## myear rates
## 0 0
## [1] FALSE
Check new data structures
## Observations: 864
## Variables: 2
## $ myear <yearmon> Jan 1948, Feb 1948, Mar 1948, Apr 1948, May 1948, Jun 194...
## $ rates <dbl> 3.4, 3.8, 4.0, 3.9, 3.5, 3.6, 3.6, 3.9, 3.8, 3.7, 3.8, 4.0, 4...
check plot
transform to time series
## [1] "ts"
decompose time series
From plot decomposition we could have some preliminary insights:
1. No trend for the data
2. We see seasonalities in data
3. with errors involved
Before we create models we have to check data stationary with adf test and KPSS test.
Stationary Test
with adf test
## Warning in adf.test(usus_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: usus_ts
## Dickey-Fuller = -7.599, Lag order = 16, p-value = 0.01
## alternative hypothesis: stationary
with kpss test
## Warning in kpss.test(usus_ts): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: usus_ts
## KPSS Level = 0.12438, Truncation lag parameter = 10, p-value = 0.1
With two test above, we could conclude that data already stationary
example of manual process
ARIMA(1,0,1)
##
## Box-Ljung test
##
## data: usus_arima_1$residuals
## X-squared = 348.14, df = 2, p-value < 2.2e-16
example of auto process
## Series: usus_ts
## ARIMA(3,0,2) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 mean
## 2.6058 -2.3337 0.7250 -1.6020 0.8191 5.6838
## s.e. 0.0244 0.0516 0.0278 0.0255 0.0282 0.2156
##
## sigma^2 estimated as 0.03835: log likelihood=964.43
## AIC=-1914.86 AICc=-1914.84 BIC=-1869.89
##
## Box-Ljung test
##
## data: usus_arima$residuals
## X-squared = 51.733, df = 2, p-value = 5.84e-12
Not to short cuts
Cross Validation
# 12 years for test
usus_test <- tail(usus_ts, 12)
# 52 years for train
usus_train <- head(usus_ts, length(usus_ts) - length(usus_test))Fitting Model Auto
## Series: usus_train
## ARIMA(0,0,1)(0,0,1)[64] with non-zero mean
##
## Coefficients:
## ma1 sma1 mean
## 0.9297 0.0793 5.6874
## s.e. 0.0042 0.0157 0.0273
##
## sigma^2 estimated as 0.7845: log likelihood=-5895.93
## AIC=11799.85 AICc=11799.86 BIC=11825.54
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.001158574 0.8854285 0.7017966 -4.374689 13.36767 0.3869551
## ACF1
## Training set 0.8224825
Forecast
Accuracy
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -1.88258 1.897319 1.88258 -49.7979 49.7979 -0.07783706 21.22212
From the prediction US Unemployment in the future will probably rise to 5.8% in the range of 3.7% to 7.4% towards 2020