The following code was executed to load the data needed:
source("code_Time_Series.R")
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
load("data_ice.Rdata")
There are two data sets are included within the file that has been uploaded.
First, the uni-variate time series ic that contains the monthly unemployment rate (%) for Iowa City, IA from January 1990 to November 2018.This will be explored first.
Start by examining the raw data ic and some of its basic characteristics.
ic
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1990 3.3 3.2 3.0 2.9 3.0 3.5 3.3 3.2 3.3 3.1 2.9 3.0
## 1991 3.5 3.5 3.3 2.9 3.0 3.3 3.0 2.9 3.0 2.8 2.6 2.9
## 1992 3.5 3.4 3.1 2.9 3.0 3.5 3.2 3.0 3.4 3.0 2.9 2.9
## 1993 3.4 3.1 2.9 2.6 2.6 3.2 2.8 2.8 2.9 2.6 2.3 2.4
## 1994 3.0 2.9 2.6 2.4 2.3 2.8 2.5 2.6 2.6 2.5 2.4 2.4
## 1995 3.1 2.8 2.5 2.5 2.6 3.2 2.9 2.9 3.0 2.6 2.5 2.5
## 1996 3.2 3.2 2.8 2.6 2.7 3.1 2.8 2.9 2.8 2.6 2.5 2.5
## 1997 3.2 3.0 2.5 2.2 2.2 2.7 2.4 2.6 2.6 2.2 2.1 2.1
## 1998 2.6 2.4 2.3 2.0 2.0 2.5 2.5 2.4 2.4 2.1 2.0 2.0
## 1999 2.5 2.5 2.2 2.1 2.0 2.4 2.4 2.3 2.0 1.7 1.6 1.8
## 2000 2.3 2.1 2.0 1.8 1.7 1.8 1.7 2.0 1.8 1.7 1.7 2.0
## 2001 2.6 2.4 2.4 2.1 1.9 2.4 2.2 2.2 2.2 2.0 2.3 2.5
## 2002 3.2 3.0 2.9 2.7 2.5 3.1 2.9 2.9 2.7 2.5 2.5 2.7
## 2003 3.6 3.3 3.3 2.9 2.9 3.4 3.3 3.3 3.1 2.8 3.0 3.2
## 2004 3.7 3.5 3.4 3.0 3.1 3.7 3.7 3.7 3.5 3.3 3.3 3.6
## 2005 3.9 3.6 3.4 3.0 3.1 3.4 3.1 3.1 3.0 2.9 3.0 3.1
## 2006 3.3 3.0 2.8 2.4 2.4 2.9 2.7 2.6 2.5 2.4 2.4 2.5
## 2007 3.1 3.1 2.9 2.6 2.5 3.0 2.8 2.9 2.7 2.7 2.5 2.8
## 2008 3.2 3.0 2.8 2.6 2.7 3.4 3.5 3.5 3.0 3.1 3.1 3.9
## 2009 4.8 4.7 4.5 4.1 4.4 5.3 5.2 5.1 4.3 4.1 4.1 4.6
## 2010 5.0 4.8 4.7 4.1 4.0 4.5 4.6 4.5 4.1 3.9 3.8 4.1
## 2011 4.7 4.4 4.2 3.8 3.8 4.4 4.4 4.5 4.0 3.7 3.6 4.0
## 2012 4.4 4.1 4.0 3.5 3.5 4.1 4.1 4.0 3.4 3.3 3.1 3.5
## 2013 4.0 3.7 3.6 3.3 3.3 3.8 3.8 3.6 3.2 3.1 2.9 3.1
## 2014 3.6 3.5 3.4 2.9 3.0 3.4 3.5 3.3 3.0 2.7 2.5 2.8
## 2015 3.3 3.0 2.7 2.5 2.6 3.1 3.0 2.7 2.6 2.4 2.3 2.6
## 2016 3.1 2.9 2.7 2.4 2.4 3.0 2.8 2.8 2.6 2.4 2.1 2.4
## 2017 3.2 3.0 2.7 2.4 2.4 2.9 2.7 2.8 2.4 1.9 2.0 2.1
## 2018 2.5 2.5 2.1 1.9 1.8 2.2 2.1 2.0 1.8 1.4 1.5 1.8
autoplot(ic)
decomp <- mstl(ic)
autoplot(decomp)
The highest peak that is seen is from the great recession in 2008.
Acf(ic)
Now that the raw data and its trending and seasonality has been explored a linear regression model will be examined to see how it fits the data.
fit <- tslm(ic ~ trend + season)
aa_plot_fitted(fit)
It can be seen from the above that it does not fit the data well at all.
accuracy(fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 1.804823e-17 0.6506904 0.5088313 -4.903047 18.17726 1.50103
## ACF1
## Training set 0.96906
Its also clear that the accuracy of the model is not good.Next, let’s check the residuals from the model.
checkresiduals(fit)
##
## Breusch-Godfrey test for serial correlation of order up to 24
##
## data: Residuals from Linear regression model
## LM test = 334.38, df = 24, p-value < 2.2e-16
As we can see there is a significant amount of autocorrelation utside of the boundaries suggesting seasonality. Next, a forecast will be predicted to see its performance against the actual data.
autoplot(forecast(fit))
The forecast doesn’t seem very accurate with the huge gap from the existing data to the forecasted line.Visually it doesn’t show signs of a good prediction.
As an alternative to regression, the exponential smoothing function ets was used to fit the data to see if exponential smoothing improves upon the regression model.
fit <- ets(ic)
aa_plot_fitted(fit)
The fitted model follows much more closely to the actual data showing great signs of an improved model already.
summary(fit)
## ETS(M,Ad,M)
##
## Call:
## ets(y = ic)
##
## Smoothing parameters:
## alpha = 0.738
## beta = 0.021
## gamma = 0.1678
## phi = 0.8002
##
## Initial states:
## l = 2.8062
## b = 0.1851
## s = 0.9497 0.903 0.9461 1.0116 0.982 1.0107
## 1.09 0.9272 0.912 1.0047 1.1032 1.1598
##
## sigma: 0.0506
##
## AIC AICc BIC
## 716.2361 718.3151 785.5757
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.005676813 0.1390263 0.1055804 -0.3827142 3.723773 0.3114574
## ACF1
## Training set 0.1367646
From above, in reviewing the smoothing parameter the \(\alpha\) is fairly close to 1,indicating a heavier reliance on new data. On the other hand, \(\beta\) and \(\gamma\) are fairly close to 0, indicating that the model keeps old data in mind for fitting the slope and seasonality.
In addition to this, we can see a a huge improvement in across most key measures, especially with ACF being coming down so much.
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,Ad,M)
## Q* = 47.584, df = 7, p-value = 4.294e-08
##
## Model df: 17. Total lags used: 24
Validating the numbers that were seen above, here too we see the ACF graph showing very few lags outside of the autocorrelation boundary.
autoplot(forecast(fit))
The above prediction is a much better fit than the linear regression model in nearly every way. It visually looks better; most of the autocorrelation is under control; and the forecasts appear much more accurate.
Now, as another alternative to both regression and exponential smoothing, I plan to use the ARIMA function auto.arima to fit the data. The result will be saved in fit and performing the same commands to the above, I will critique how this fit compare to the others.
fit <- auto.arima(ic)
aa_plot_fitted(fit)
The model fits very closely to the data - similar to the Exponential Smoothing model.
summary(fit)
## Series: ic
## ARIMA(1,0,0)(0,1,2)[12]
##
## Coefficients:
## ar1 sma1 sma2
## 0.9710 -0.5837 -0.1204
## s.e. 0.0135 0.0548 0.0557
##
## sigma^2 estimated as 0.01954: log likelihood=181.43
## AIC=-354.85 AICc=-354.73 BIC=-339.58
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.003807855 0.1367377 0.1047845 -0.2445016 3.739754 0.3091098
## ACF1
## Training set -0.03707274
The code `ARIMA(1,0,0)(0,1,2 means that the model returned by R, which is the best ARIMA fit it could find, is based on a single level of differencing across seasons, two orders of seasonal moving averages, and finally a single lag.
The training set error measures are very similar to the here too against the Exponential Smoothing model. The ACF is slightly lower here.
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(0,1,2)[12]
## Q* = 17.799, df = 21, p-value = 0.6617
##
## Model df: 3. Total lags used: 24
We can the ACF has come down slightly further for this model type.
autoplot(forecast(fit))
The forecast for teh ARIMA model is also closely trending near the actual.
Overall, the ARIMA model and exponential smoothing model visually, fit quite good, and the error measures are nearly identical, e.g., the MAPE is 3.74% in both. Both models have some issues with autocorrelations (lags 7, 12, and 13 for exponential smoothing and lags 5 and 7 for ARIMA). Overall, I would say these models are about the same.