US Unemployment rate

https://www.kaggle.com/tunguz/us-monthly-unemployment-rate-1948-present

Business Objective

The Business objectives of this project is to forecast US unemployment rate or number of people that do not have jobs or unemployed in the near future. The projection will be developed with Auto ARIMA to speed up process and as one of time series popular techniques to process financial or economic data.

Pre Processing

Set up Libraries

library(tidyverse)
library(lubridate)
library(zoo)
library(dplyr)
library(forecast)
library(TTR)
library(fpp)
library(xts)
library(autoplotly)

Read the Data

usu <- read_csv("USUnemployment.csv") 
usu$Year <- as.character(usu$Year)
head(usu)

Data Transformation | Pivot Longer for month

usu_l <- usu %>% 
   pivot_longer(-Year, names_to = "month", values_to = "rates")
   head(usu_l)

Data Transformation | Convert month to numbers

usu_l$month <- match(usu_l$month,month.abb)
head(usu_l)

Data Transformation | Convert Year data type character to numeric

usu_l %>% 
  mutate_if(is.character, as.numeric) %>% 
  mutate_if(is.integer, as.numeric)

Data Transformation | Convert Year and Month to one column

usu_l$myear <- as.yearmon(paste(usu_l$Year, usu_l$month), "%Y %m")
head(usu_l)

Data Transformation | Remove other redundant variables

usus <- usu_l %>% 
  select(c(myear,rates,-Year,-month))
  head(usus)

Check missing values

colSums(is.na(usus))

## myear rates 
##     0     0

anyNA(usus)

## [1] FALSE

Check new data structures

glimpse(usus)

## Observations: 864
## Variables: 2
## $ myear <yearmon> Jan 1948, Feb 1948, Mar 1948, Apr 1948, May 1948, Jun 194...
## $ rates <dbl> 3.4, 3.8, 4.0, 3.9, 3.5, 3.6, 3.6, 3.9, 3.8, 3.7, 3.8, 4.0, 4...

Data Overview

check plot

ggplot(usus, aes(myear, rates)) + 
  geom_line()

transform to time series

usus_ts <- ts(usus$rates, start=c(1948, 1), end=c(2019, 12), frequency=64)
class(usus_ts)

## [1] "ts"

plot(usus_ts)

decompose time series

usus_dc <- decompose(usus_ts)
plot(usus_dc)

From plot decomposition we could have some preliminary insights:
1. No trend for the data
2. We see seasonalities in data
3. with errors involved

ARIMA Process

Before we create models we have to check data stationary with adf test and KPSS test.

Stationary Test

with adf test

adf.test(usus_ts)

## Warning in adf.test(usus_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  usus_ts
## Dickey-Fuller = -7.599, Lag order = 16, p-value = 0.01
## alternative hypothesis: stationary

with kpss test

kpss.test(usus_ts)

## Warning in kpss.test(usus_ts): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  usus_ts
## KPSS Level = 0.12438, Truncation lag parameter = 10, p-value = 0.1

With two test above, we could conclude that data already stationary

tsdisplay(usus_ts)

tsdisplay(x = diff(usus_ts))

#ts_cor(ts.obj = diff(usus_ts))

example of manual process

ARIMA(1,0,1)

usus_arima_1 <- Arima(usus_ts, order=c(1,0,1), seasonal=c(0,1,1))
tsdisplay(usus_arima_1$residuals)

Box.test(usus_arima_1$residuals, lag=2, type="Ljung")

## 
##  Box-Ljung test
## 
## data:  usus_arima_1$residuals
## X-squared = 348.14, df = 2, p-value < 2.2e-16

example of auto process

usus_arima <- auto.arima(usus_ts, seasonal=F)
usus_arima

## Series: usus_ts 
## ARIMA(3,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ma1     ma2    mean
##       2.6058  -2.3337  0.7250  -1.6020  0.8191  5.6838
## s.e.  0.0244   0.0516  0.0278   0.0255  0.0282  0.2156
## 
## sigma^2 estimated as 0.03835:  log likelihood=964.43
## AIC=-1914.86   AICc=-1914.84   BIC=-1869.89

tsdisplay(usus_arima$residuals)

Box.test(usus_arima$residuals, lag=2, type="Ljung")

## 
##  Box-Ljung test
## 
## data:  usus_arima$residuals
## X-squared = 51.733, df = 2, p-value = 5.84e-12

Not to short cuts

#usus_arima_NS <- auto.arima(usus_ts, stepwise=F, approximation=F)
#usus_arima_NS

Cross Validation

# 12 years for test 
usus_test <- tail(usus_ts, 12)
# 52 years for train
usus_train <- head(usus_ts, length(usus_ts) - length(usus_test))

Fitting Model Auto

usus_auto <- auto.arima(y = usus_train)

summary(usus_auto)

## Series: usus_train 
## ARIMA(0,0,1)(0,0,1)[64] with non-zero mean 
## 
## Coefficients:
##          ma1    sma1    mean
##       0.9297  0.0793  5.6874
## s.e.  0.0042  0.0157  0.0273
## 
## sigma^2 estimated as 0.7845:  log likelihood=-5895.93
## AIC=11799.85   AICc=11799.86   BIC=11825.54
## 
## Training set error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.001158574 0.8854285 0.7017966 -4.374689 13.36767 0.3869551
##                   ACF1
## Training set 0.8224825

Forecast

usus_forecast <- forecast(object = usus_auto, h = 12)
autoplotly(usus_forecast)

Accuracy

accuracy(f = usus_forecast$mean, usus_test)

##                ME     RMSE     MAE      MPE    MAPE        ACF1 Theil's U
## Test set -1.88258 1.897319 1.88258 -49.7979 49.7979 -0.07783706  21.22212

Conclusion

From the prediction US Unemployment in the future will probably rise to 5.8% in the range of 3.7% to 7.4% towards 2020