library(ggplot2)
library(forecast)
library(tseries)
library(prophet)
## Loading required package: Rcpp
library(stR)
##
## Attaching package: 'stR'
## The following object is masked from 'package:forecast':
##
## seasadj
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(Metrics)
Part I - summary statistics and a time series chart
## ds y
## Min. :2011-01-01 Min. : 59.43
## 1st Qu.:2012-04-01 1st Qu.: 444.06
## Median :2013-07-01 Median : 499.84
## Mean :2013-07-01 Mean : 547.65
## 3rd Qu.:2014-09-30 3rd Qu.: 603.16
## Max. :2015-12-31 Max. :1141.09
## NA's :33
Part II - Forecast & Prophet decomposition charts for the raw data, seasonality and trend
## Initial log joint probability = -22.4069
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
## Length Class Mode
## growth 1 -none- character
## changepoints 25 POSIXct numeric
## n.changepoints 1 -none- numeric
## yearly.seasonality 1 -none- character
## weekly.seasonality 1 -none- character
## daily.seasonality 1 -none- logical
## holidays 0 -none- NULL
## seasonality.prior.scale 1 -none- numeric
## changepoint.prior.scale 1 -none- numeric
## holidays.prior.scale 1 -none- numeric
## mcmc.samples 1 -none- numeric
## interval.width 1 -none- numeric
## uncertainty.samples 1 -none- numeric
## specified.changepoints 1 -none- logical
## start 1 POSIXct numeric
## y.scale 1 -none- numeric
## logistic.floor 1 -none- logical
## t.scale 1 -none- numeric
## changepoints.t 25 -none- numeric
## seasonalities 3 -none- list
## extra_regressors 0 -none- list
## stan.fit 0 -none- NULL
## params 6 -none- list
## history 5 data.frame list
## history.dates 1826 POSIXct numeric
Part III - forecast data for 2016 ONLY in a data frame for subsequent analysis
Part IV - stR Decomposition
The Str & Propphet compoments depict a similar story when comparing the trends, seasonal and random error plots - that the data appears to be seaonal and somewhat stationary.
Part V - Forecast package
Forecast against test data set. na.approx was used to impute missing data points.
In order for ARIMA models to work, the data has to be stationary. A check of this, along with our visual inspection of the deasonalized trend shows that the data is stationary (adf test p-value of .05), tells us that we do not need differencing in this model. This data is seasonal.
Following this, our (p,d,q) values that are generated by auto.arime() means that our “p” of 5 means that we have spikes at 5 “lags” or each of the periods/years in our Auto-Regressive (AR) model - (see ar1, ar2, ar3,ar4 & ar5 below). As we stated earlier, our p-value is .05, so this nets a differencing value of 1 in the AR model and 3 moving average lags (ma1, ma2, ma3).
##
## Augmented Dickey-Fuller Test
##
## data: na.approx(Etseries)
## Dickey-Fuller = -3.4148, Lag order = 12, p-value = 0.05087
## alternative hypothesis: stationary
Part VI Actuals vs. Forecast for 2016
Based on the RMSE, MAE & MSE being the lowest for the prophet forecast when compared to other methods (ARIMA, Seasonal) from the forecast package - I would say this model was the best peforming of the ones we tested.
## [1] 98.99531
## [1] 108.8976
## [1] 128.3336
## [1] 72.15559
## [1] 77.774
## [1] 83.82817
## [1] 9800.071
## [1] 11858.7
## [1] 16469.51
## rmsematrix
## Prophet 98.99531
## Seasonal 108.89764
## ARIMA 128.33358
## maematrix
## Prophet 72.15559
## Seasonal 77.77400
## ARIMA 83.82817
## msematrix
## Prophet 9800.071
## Seasonal 11858.696
## ARIMA 16469.508
Part VII - Additional Correlation Test
Is a basic Pearson’s R correlation test valid, when both variables are seasonal…does this make the comparison linear?
## [1] 0.8431467
## [1] 0.8073314
## [1] 0.6920107
## cormatrix
## Prophet 0.8431467
## Seasonal 0.8073314
## ARIMA 0.6920107
Sources: Time Series
fill in missing data Seasonal Decomposition