https://www.kaggle.com/tunguz/us-monthly-unemployment-rate-1948-present

Business Objective

The Business objectives of this project is to forecast US unemployment rate or number of people that do not have jobs or unemployed in the near future. The projection will be developed with Auto ARIMA to speed up process and as one of time series popular techniques to process financial or economic data.

Pre Processing

Set up Libraries

Read the Data

Data Transformation | Pivot Longer for month

Data Transformation | Convert month to numbers

Data Transformation | Convert Year data type character to numeric

Data Transformation | Convert Year and Month to one column

Data Transformation | Remove other redundant variables

Check missing values

## myear rates 
##     0     0
## [1] FALSE

Check new data structures

## Observations: 864
## Variables: 2
## $ myear <yearmon> Jan 1948, Feb 1948, Mar 1948, Apr 1948, May 1948, Jun 194...
## $ rates <dbl> 3.4, 3.8, 4.0, 3.9, 3.5, 3.6, 3.6, 3.9, 3.8, 3.7, 3.8, 4.0, 4...

Data Overview

check plot

transform to time series

## [1] "ts"

decompose time series

From plot decomposition we could have some preliminary insights:
1. No trend for the data
2. We see seasonalities in data
3. with errors involved

ARIMA Process

Before we create models we have to check data stationary with adf test and KPSS test.

Stationary Test

with adf test

## Warning in adf.test(usus_ts): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  usus_ts
## Dickey-Fuller = -7.599, Lag order = 16, p-value = 0.01
## alternative hypothesis: stationary

with kpss test

## Warning in kpss.test(usus_ts): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  usus_ts
## KPSS Level = 0.12438, Truncation lag parameter = 10, p-value = 0.1

With two test above, we could conclude that data already stationary

example of manual process

ARIMA(1,0,1)

## 
##  Box-Ljung test
## 
## data:  usus_arima_1$residuals
## X-squared = 348.14, df = 2, p-value < 2.2e-16

example of auto process

## Series: usus_ts 
## ARIMA(3,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2    mean
##       2.6058  -2.3337  0.7250  -1.6020  0.8191  5.6838
## s.e.  0.0244   0.0516  0.0278   0.0255  0.0282  0.2156
## 
## sigma^2 estimated as 0.03835:  log likelihood=964.43
## AIC=-1914.86   AICc=-1914.84   BIC=-1869.89

## 
##  Box-Ljung test
## 
## data:  usus_arima$residuals
## X-squared = 51.733, df = 2, p-value = 5.84e-12

Not to short cuts

Cross Validation

Fitting Model Auto

## Series: usus_train 
## ARIMA(0,0,1)(0,0,1)[64] with non-zero mean 
## 
## Coefficients:
##          ma1    sma1    mean
##       0.9297  0.0793  5.6874
## s.e.  0.0042  0.0157  0.0273
## 
## sigma^2 estimated as 0.7845:  log likelihood=-5895.93
## AIC=11799.85   AICc=11799.86   BIC=11825.54
## 
## Training set error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.001158574 0.8854285 0.7017966 -4.374689 13.36767 0.3869551
##                   ACF1
## Training set 0.8224825

Forecast

Accuracy

##                ME     RMSE     MAE      MPE    MAPE        ACF1 Theil's U
## Test set -1.88258 1.897319 1.88258 -49.7979 49.7979 -0.07783706  21.22212

Conclusion

From the prediction US Unemployment in the future will probably rise to 5.8% in the range of 3.7% to 7.4% towards 2020