DATA661 - Time Series Forecasting

library(ggplot2)
library(forecast)
library(tseries)
library(prophet)

## Loading required package: Rcpp

library(stR)

## 
## Attaching package: 'stR'

## The following object is masked from 'package:forecast':
## 
##     seasadj

library(zoo)

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(Metrics)

Part I - summary statistics and a time series chart

##        ds                   y          
##  Min.   :2011-01-01   Min.   :  59.43  
##  1st Qu.:2012-04-01   1st Qu.: 444.06  
##  Median :2013-07-01   Median : 499.84  
##  Mean   :2013-07-01   Mean   : 547.65  
##  3rd Qu.:2014-09-30   3rd Qu.: 603.16  
##  Max.   :2015-12-31   Max.   :1141.09  
##                       NA's   :33

Part II - Forecast & Prophet decomposition charts for the raw data, seasonality and trend

## Initial log joint probability = -22.4069
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

##                         Length Class      Mode     
## growth                     1   -none-     character
## changepoints              25   POSIXct    numeric  
## n.changepoints             1   -none-     numeric  
## yearly.seasonality         1   -none-     character
## weekly.seasonality         1   -none-     character
## daily.seasonality          1   -none-     logical  
## holidays                   0   -none-     NULL     
## seasonality.prior.scale    1   -none-     numeric  
## changepoint.prior.scale    1   -none-     numeric  
## holidays.prior.scale       1   -none-     numeric  
## mcmc.samples               1   -none-     numeric  
## interval.width             1   -none-     numeric  
## uncertainty.samples        1   -none-     numeric  
## specified.changepoints     1   -none-     logical  
## start                      1   POSIXct    numeric  
## y.scale                    1   -none-     numeric  
## logistic.floor             1   -none-     logical  
## t.scale                    1   -none-     numeric  
## changepoints.t            25   -none-     numeric  
## seasonalities              3   -none-     list     
## extra_regressors           0   -none-     list     
## stan.fit                   0   -none-     NULL     
## params                     6   -none-     list     
## history                    5   data.frame list     
## history.dates           1826   POSIXct    numeric

Part III - forecast data for 2016 ONLY in a data frame for subsequent analysis

Part IV - stR Decomposition

The Str & Propphet compoments depict a similar story when comparing the trends, seasonal and random error plots - that the data appears to be seaonal and somewhat stationary.

Part V - Forecast package

Forecast against test data set. na.approx was used to impute missing data points.

In order for ARIMA models to work, the data has to be stationary. A check of this, along with our visual inspection of the deasonalized trend shows that the data is stationary (adf test p-value of .05), tells us that we do not need differencing in this model. This data is seasonal.

Following this, our (p,d,q) values that are generated by auto.arime() means that our “p” of 5 means that we have spikes at 5 “lags” or each of the periods/years in our Auto-Regressive (AR) model - (see ar1, ar2, ar3,ar4 & ar5 below). As we stated earlier, our p-value is .05, so this nets a differencing value of 1 in the AR model and 3 moving average lags (ma1, ma2, ma3).

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.approx(Etseries)
## Dickey-Fuller = -3.4148, Lag order = 12, p-value = 0.05087
## alternative hypothesis: stationary

Part VI Actuals vs. Forecast for 2016

Based on the RMSE, MAE & MSE being the lowest for the prophet forecast when compared to other methods (ARIMA, Seasonal) from the forecast package - I would say this model was the best peforming of the ones we tested.

## [1] 98.99531

## [1] 108.8976

## [1] 128.3336

## [1] 72.15559

## [1] 77.774

## [1] 83.82817

## [1] 9800.071

## [1] 11858.7

## [1] 16469.51

##          rmsematrix
## Prophet    98.99531
## Seasonal  108.89764
## ARIMA     128.33358

##          maematrix
## Prophet   72.15559
## Seasonal  77.77400
## ARIMA     83.82817

##          msematrix
## Prophet   9800.071
## Seasonal 11858.696
## ARIMA    16469.508

Part VII - Additional Correlation Test

Is a basic Pearson’s R correlation test valid, when both variables are seasonal…does this make the comparison linear?

## [1] 0.8431467

## [1] 0.8073314

## [1] 0.6920107

##          cormatrix
## Prophet  0.8431467
## Seasonal 0.8073314
## ARIMA    0.6920107

Sources: Time Series

fill in missing data Seasonal Decomposition

Transform to MSTS before you can use AutoSTR

Prophet Package

ARIMA Decomposition

DATA661 - Time Series Forecasting

Blandon Casenave

10/5/2017