Three years out
- I show code for three years out and leave it out for the other 2 splits, as it’s same code
## 2011 forward
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
fc <- snaive(myts.train)
checkresiduals(fc)

##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 553.11, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
myts.train %>% ets %>% forecast(h=length(myts.test),PI
=FALSE) -> fc
three_years_out <- autoplot(fc) + autolayer(myts.test)
twenty_11_forward <- round(accuracy(fc,myts.test),2)
three_years_out

two years out

##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 568.78, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24

One year out

##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 549.26, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
## 2013 116.0 90.6 96.3 112.9 100.8 93.1 101.9 109.1 107.6 122.7 154.8
## Dec
## 2013 237.1
Compare accuracy scores
# test reuslts
accuracy_table <- as.data.frame(rbind(twenty_11_forward[2,1:8],twenty_12_forward[2,1:8],twenty_13_forward[2,1:8]))
rownames(accuracy_table) <- c('three_yearsout',"two_years_out","one_year_out")
kable(accuracy_table,caption= 'Comparison of train/test split accuracy')
Comparison of train/test split accuracy
|
ME
|
RMSE
|
MAE
|
MPE
|
MAPE
|
MASE
|
ACF1
|
Theil’s U
|
three_yearsout
|
1.99
|
11.81
|
8.83
|
0.46
|
7.75
|
1.80
|
0.45
|
0.48
|
two_years_out
|
-5.94
|
12.97
|
10.11
|
-6.55
|
9.55
|
2.03
|
0.38
|
0.67
|
one_year_out
|
17.17
|
20.87
|
17.17
|
13.26
|
13.26
|
3.38
|
0.37
|
0.81
|
One last test for train test split
- it Looks like the variation in the data set clearly changes overtime. I wonder if i just use the past couple years to train, if our model will have better accuracy
- the model seems to have avoided auto correlation issues, but the accuracy is still largely in line with the other train test split models
myts.train <- window(myts, start=c(2010,1),end=2013)
myts.test <- window(myts, start=2013)
fc <- snaive(myts.train)
checkresiduals(fc)

##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 12.239, df = 7.4, p-value = 0.1107
##
## Model df: 0. Total lags used: 7.4
myts.train %>% ets %>% forecast(h=length(myts.test),PI
=FALSE) -> fc
most_recent_data_only <- autoplot(fc) + autolayer(myts.test)
most_recent_data_only <- round(accuracy(fc,myts.test),2)[2,1:8]
twenty_13_forward
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 0.08 4.23 3.01 -0.26 4.63 0.59 0.13 NA
## Test set 17.17 20.87 17.17 13.26 13.26 3.38 0.37 0.81
accuracy_table <- rbind(accuracy_table,most_recent_data_only)
rownames(accuracy_table)[4] <-"most_recent_data_only"
kable(accuracy_table,caption= 'Comparison of train/test split accuracy')
Comparison of train/test split accuracy
|
ME
|
RMSE
|
MAE
|
MPE
|
MAPE
|
MASE
|
ACF1
|
Theil’s U
|
three_yearsout
|
1.99
|
11.81
|
8.83
|
0.46
|
7.75
|
1.80
|
0.45
|
0.48
|
two_years_out
|
-5.94
|
12.97
|
10.11
|
-6.55
|
9.55
|
2.03
|
0.38
|
0.67
|
one_year_out
|
17.17
|
20.87
|
17.17
|
13.26
|
13.26
|
3.38
|
0.37
|
0.81
|
most_recent_data_only
|
17.46
|
20.31
|
17.46
|
13.73
|
13.73
|
2.28
|
0.34
|
0.84
|
Conclusion
- Overall our accuracy scores seem to be extremely dependent on whatever data points we are using. This likely explains why we need to perform cross validation in order to really test our time series data.