a) Compare the forecast accuracy of TSLM (seasonal and trend), ETS, ARIMA, and RW with drift. Which model fits the best?

load packages

library(fpp2)
library(knitr)

Loading dataset

PAYNSA = read.csv("C:/ECON 4210/PAYNSA.csv")
df = PAYNSA
df = ts(df, start=2009, frequency=12)
y = df[,"PAYNSA"]

Properties of dataset

y %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot()

Before, we get into forecasting, I felt it was paramount to discuss the interesting dataset provided which might help in explaining forecasting methods. Stl provides a quick way to understand general properties of the data by decomposing the timeseries and giving a hollistic picture by showing the pattern in data, seasonality, anomilies if any and trend.

The dataset shows non-stationary & increasing variance (graph 1), a general increasing trend since 2010, resulting in a ‘nike’ type logo graph (graph 2), very strong seasonality (graph 3). The last graph - remainder (graph 4) shows unusual observations in the first few years until a bit around 2010 after which the anomilies die down. This is linked with the falling trend in the first couple of years.

Training data & Test data

We will use data to the end of 2016 for training and data from January 2017 to August 2018 for testing.

train <- window(y,start=c(2009, 1),end=c(2016, 12))

test <- window(y, start=2017)
both <- window(y,start=c(2009, 1))
h=length(test)

Forecast Benchmarks

fit1 <- meanf(train, h=h)
fit2 <- naive(train, h=h)
fit3 <- snaive(train, h=h)

Since the data has strong seasonality and trend; we can expect a good fit for the model using the naïve technique which tries to captures both.

ETS method

fit4 <- ses(train, h = h)
p4 = autoplot(fit4)

fit5 <- holt(train,  h=h) 
p5 = autoplot(fit5)

fit6 <- holt(train, exponential=TRUE, h=h) 
p6 = autoplot(fit6)

fit7 <- holt(train, damped=TRUE, h=h) 
p7 = autoplot(fit7)

fit8 <- hw(train, seasonal="multiplicative", h=h) 
p8 = autoplot(fit8)

y.ets <- ets(train, model="ZZZ") 
fit9 <- forecast(y.ets, h=h)
p9 = autoplot(fit9)

STL method

y.stl <- stl(train, t.window=15, s.window="periodic", robust=TRUE)
fit10 <- forecast(y.stl, method="rwdrift",h=h)
p10 = autoplot(fit10)

tps <- tslm(train ~ trend + season)
fit11 = forecast(tps,h=h)
p11 = autoplot(fit11)

ARIMA method

ndiffs(y)
## [1] 1

1 difference is required for the dataset. auto.arima will take care of it.

# arima
fit12 = forecast(auto.arima(train), h=h)
## Warning in auto.arima(train): Having 3 or more differencing operations is
## not recommended. Please consider reducing the total number of differences.
p12 = autoplot(fit12)

Graph models

I will not be plot the benchmarks because it will not deliver any value. I will also split the graphs in 2 major categories - ETS methods and the other methods since it was getting cluttered with all graphs.

ETS Methods Forecasts Plotted

gridExtra::grid.arrange(p4, p5, p6, p7, p8, p9, nrow=3)

STL, TSLM & ARIMA Method plotted

gridExtra::grid.arrange(p10, p11, p12, nrow=3)

Forecasts from simple exponential moving averages, holt’s linear trend method, holt’s exponential trend method and holt’s damped trend method are obviously not the best forecasting model because all of them fail to capture enough seasonality in the data.

It is interesting to note that the forecasted graph for STL (random walk with drift) and Linear regression model looks so similar except the confidence interval.

Accuracy measures

a1 = accuracy(fit1, test)
a2 = accuracy(fit2, test)
a3 = accuracy(fit3, test)
a4 = accuracy(fit4, test)
a5 = accuracy(fit5, test)
a6 = accuracy(fit6, test)
a7 = accuracy(fit7, test)
a8 = accuracy(fit8, test)
a9 = accuracy(fit9, test)
a10 = accuracy(fit10, test)
a11 = accuracy(fit11, test)
a12 = accuracy(fit12, test)

a.table<-rbind(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12)

row.names(a.table)<-c('Mean training','Mean test', 'Naive training', 'Naive test', 'S. Naive training', 'S. Naive test' ,'ES training','ES test', 'Holt linear training', 'Holt linear test', 'Holt ES training', 'Holt ES test' ,'Holt dampled training','Holt damped test', 'Holt Winters training', 'Holt Winters test', 'ETS training', 'ETS test' ,'STL training','STL test','linear trend training','linear trend test','ARIMA training', 'ARIMA test')

a.table<-as.data.frame(a.table)
a.table<-a.table[order(a.table$MASE),]
a.table
##                                  ME       RMSE        MAE          MPE
## ARIMA training        -3.925572e+01   163.0460   121.0025 -0.029283838
## Holt Winters test      3.494133e+01   186.1152   150.1987  0.022999251
## STL training          -9.037438e-12   238.4130   157.8037 -0.002599284
## Holt Winters training  8.865997e+00   373.6542   214.2609  0.006955977
## ETS training           1.212696e+02   307.9782   240.7472  0.086852817
## ARIMA test             5.548455e+02   651.4240   554.8455  0.374788325
## STL test               7.303141e+02   874.8568   730.3141  0.493003357
## Holt linear training   1.320124e+02  1053.4509   804.8774  0.094062334
## ES training            1.482213e+02  1049.8355   807.7404  0.103589128
## Holt dampled training  1.603078e+02  1049.8798   809.9024  0.112763512
## Naive training         1.497684e+02  1055.3351   816.2316  0.104670615
## Holt ES training       2.074977e+02  1060.4848   838.6866  0.147080855
## linear trend training  1.212472e-12  1313.0874   999.6978 -0.008867637
## Holt linear test      -7.393170e+02  1331.6946  1018.5157 -0.509760743
## linear trend test      1.310162e+03  1351.1332  1310.1618  0.887704226
## Holt damped test       9.701774e+02  1965.4732  1663.1488  0.645427090
## ES test                9.728787e+02  1967.2900  1664.8893  0.647257534
## Naive test             9.729000e+02  1967.3005  1664.9000  0.647272015
## ETS test               2.132038e+03  2415.4764  2132.0383  1.441165158
## S. Naive training      1.862798e+03  2312.1441  2186.8452  1.337580486
## Holt ES test           1.655940e+03  2614.2614  2208.4141  1.108840389
## S. Naive test          3.218550e+03  3423.6783  3218.5500  2.182008148
## Mean training         -9.699001e-12  5054.4665  4373.4323 -0.136125598
## Mean test              1.107844e+04 11209.6212 11078.4417  7.511380393
##                             MAPE       MASE        ACF1  Theil's U
## ARIMA training        0.08931321 0.05533201 -0.08542645         NA
## Holt Winters test     0.10192942 0.06868283  0.29531334  0.1723976
## STL training          0.11713571 0.07216044  0.63457584         NA
## Holt Winters training 0.16038543 0.09797715  0.57459412         NA
## ETS training          0.17778532 0.11008879  0.47077759         NA
## ARIMA test            0.37478832 0.25371958  0.71919281  0.6184728
## STL test              0.49300336 0.33395782  0.77414699  0.8301795
## Holt linear training  0.59471352 0.36805411  0.07851415         NA
## ES training           0.59575321 0.36936332  0.08843060         NA
## Holt dampled training 0.59736482 0.37035196  0.08830227         NA
## Naive training        0.60201613 0.37324616  0.08781165         NA
## Holt ES training      0.61825149 0.38351439  0.08952287         NA
## linear trend training 0.74093908 0.45714152  0.87194129         NA
## Holt linear test      0.69716415 0.46574659  0.46873167  1.1008451
## linear trend test     0.88770423 0.59911044  0.46441262  1.2796916
## Holt damped test      1.12576640 0.76052424  0.65155154  1.7632682
## ES test               1.12693452 0.76132015  0.65164101  1.7651780
## Naive test            1.12694165 0.76132502  0.65164101  1.7651901
## ETS test              1.44116516 0.97493789  0.83348747  2.2934518
## S. Naive training     1.58908201 1.00000000  0.86547728         NA
## Holt ES test          1.49246617 1.00986300  0.69734334  2.4073471
## S. Naive test         2.18200815 1.47177767  0.85144638  3.2263210
## Mean training         3.19820832 1.99988194  0.95414586         NA
## Mean test             7.51138039 5.06594681  0.65164101 10.5864915

Benchmarks accuracy

Naïve performed better than the other 3 benchmarks. The MAPE which shows the general accuracy of forecasts is the lowest; we will therefore regard this as the most important benchmark.

Takeaway

The results show us that the Holt Winters (fit 8) has not only outperformed the benchmarks and all other forecasting models in nearly every category specifically having the lowest MASE and MAPE, it has performed well against the benchmarks especially Naive Forecasting (0.06 vs 0.76). The result makes sense aswell because Holt’s winter is an extended version of holt method and the forecasting equation comprises level, seasonal and trend - all of which is present in this dataset. The ARIMA method is the second best model based on MASE. The mean test has performed the worst in overall forecasting accuracy.

Actual + Forecasted

autoplot(y, series = "Actual") + 
  ggtitle("PAYNSA monthly sales (millions $)") +
  autolayer(fit8, PI = FALSE, series = "HW Forecast") +
  ylab("PAYNSA Revenue ($ Millions)")+
  theme(plot.title = element_text(size=15, face = "bold",hjust = 0.5) ,
        axis.text.x = element_text(size=10),
        axis.text.y = element_text(size=10))

The predicted line almost maps onto the actual datasets which shows that it is very good for forecasting PAYNSA sales.

Forecasted

y_hw_forecast = hw(y, seasonal="multiplicative")
ff = forecast(y_hw_forecast, h=6)
autoplot(ff)+
  ylab("PAYNSA Revenue ($ Millions)")+
  theme(plot.title = element_text(size=15, face = "bold",hjust = 0.5) ,
        axis.text.x = element_text(size=10),
        axis.text.y = element_text(size=10))

b) Now use the full data set to construct time series cross validation for each of the models in a). The forecast horizon is 6 periods. Use a window length of 60 observations. Based on MSE which model fits the best for each forecast period? For each forecast model, produce a plot of MSE against forecast period. What do you conclude?

Introducing Functions

Benchmarks

# Benchmarks
fmeanf <- function(x, h) {
  forecast(meanf(x), h=h)
}
fnaive <- function(x, h) {
  forecast(naive(x), h=h)
}
fsnaive <- function(x, h) {
  forecast(snaive(x), h=h)
}

ETS methods

# simple exponential moving averages
fses <- function(x, h) {
  forecast(ses(x), h = h)
}
# holt's linear trend method
fholtlin <- function(x, h) {
  forecast(holt(x), h=h)
}
# holt's exponential trend method
fholtexp <- function(x, h) {
  forecast(holt(x, exponential=TRUE, h=h))
}
# holt's damped trend method
fholtdmp <- function(x, h) {
  forecast(holt(x, damped=TRUE, h=h))
}
# holt winter's  method
fhw <- function(x, h) {
  forecast(hw(x, seasonal="multiplicative", h=h))
}
# ETS  method
fets <- function(x, h) {
  y.ets <- ets(x, model="ZZZ") 
  forecast(y.ets, h=h)
}

Other method

# STL Method
fstl <- function(x, h) {
  y.stl <- stl(x, t.window=15, s.window="periodic", robust=TRUE)
  forecast(y.stl, method="rwdrift",h=h)
}
# trend plus seasonal
ftslm <- function(x, h) {
  tps <- tslm(x ~ trend + season)
  forecast(tps,h=h)
}
## arima
farima <- function(x, h) {
  forecast(auto.arima(x), h=h)
}

Time Series Cross Validation

I windowed the data to 60 observations which is technically till end of 2013

win_data <- window(y, end=c(2013,12))

# Calling functions and generating tsCV 
e1 <- tsCV(win_data, fmeanf, h=6)
e2 <- tsCV(win_data, fnaive, h=6)
e3 <- tsCV(win_data, fsnaive, h=6)
e4 <- tsCV(win_data, fses, h=6)
e5 <- tsCV(win_data, fholtlin, h=6)
e6 <- tsCV(win_data, fholtexp, h=6)
e7 <- tsCV(win_data, fholtdmp, h=6)
e8 <- tsCV(win_data, fhw, h=6)
e9 <- tsCV(win_data, fets, h=6)
e10 <- tsCV(win_data, fstl, h=6)
e11 <- tsCV(win_data, ftslm, h=6)
e12 <- tsCV(win_data, farima, h=6)

MSE of each model class

mse1 = mean(e1^2, na.rm=TRUE)
mse2 = mean(e2^2, na.rm=TRUE)
mse3 = mean(e3^2, na.rm=TRUE)
mse4 = mean(e4^2, na.rm=TRUE)
mse5 = mean(e5^2, na.rm=TRUE)
mse6 = mean(e6^2, na.rm=TRUE)
mse7 = mean(e7^2, na.rm=TRUE)
mse8 = mean(e8^2, na.rm=TRUE)
mse9 = mean(e9^2, na.rm=TRUE)
mse10 = mean(e10^2, na.rm=TRUE)
mse11 = mean(e11^2, na.rm=TRUE)
mse12 = mean(e12^2, na.rm=TRUE)

MSE Table

a.table<-rbind(mse1, mse2, mse3, mse4, mse5, mse6, mse7, mse8, mse9, mse10, mse11, mse12)
row.names(a.table)<-c('Mean MSE', 'Naive MSE', 'S. Naive MSE' ,
                      'ES MSE', 'Holt linear MSE', 'Holt ES MSE',
                      'Holt dampled MSE', 'Holt Winters MSE', 'ETS MSE',      
                      'STL MSE','linear trend MSE','ARIMA MSE')
a.table<-as.data.frame(a.table)
a.table
##                       V1
## Mean MSE         7847628
## Naive MSE        2455062
## S. Naive MSE     3624710
## ES MSE           2496739
## Holt linear MSE  3532598
## Holt ES MSE      3225950
## Holt dampled MSE 3234888
## Holt Winters MSE 3679662
## ETS MSE          1694382
## STL MSE          1425249
## linear trend MSE 4575902
## ARIMA MSE        1181006

MSE against Forecast period

In order to save space, nested loops is used here

models = c("fmeanf","fnaive","fsnaive",
            "fses","fholtlin","fholtexp",
            "fholtdmp","fhw","fets",
            "fstl","ftslm","farima")

# Nested loop -> R is not very fast in loops. + Forecasting
for (mdl in models) {
  mse = c() # Initiate an empty data to store mse values for each model
  for (i in c(1,2,3,4,5,6)){  
    e <- tsCV(win_data, get(mdl), h=i)
    mse[i] = mean(e^2, na.rm=TRUE)
  }
  plot(mse,ann=FALSE)+
  title(main=mdl, col.main="red", font.main=4)+
  title(xlab="h")+
  title(ylab="MSE")
}

Detailed breakdown

Benchmark MSE graphs(plot1,2,3)

meanf(plot 1) shows that as forecast period increases, the errors also increase in a linear pattern - this makes sense because it is difficult to capture seasonality through averaging. This model is obviously as we discussed above is the worst model and we can see the minimum MSE value being above 70,000. naive(plot 2) is not a good model because the data does not follow a random walk; the graph shows a quadratic pattern where the errors increase at a high rate initially but then it lowers Seasonal naive(plot 3)shows an inverted pattern of plot 2, and the errors actaully decrease as we take a broader range view which makes sense given the seasonality nature of the dataset.

ETS models MSE graphs(plot4,5,6,7,8,9)

ES smoothing(plot 4), Holt linear MSE(plot 5), Holt exponential method (plot 6),Holt dampled(plot 7),ETS Model(plot 9) show a similar trend as naive, with a bit of varying gradient on each forecasting range. Holt winter(plot 8) show a pattern like the simple average model. Overall under ETS models, the ets() model(plot 9) performed the best under all the ETS methods which makes sense because the way the forecasting works for ets() is that it estimates the model parameters and returns information about the fitted model and chooses the best model.

Other model graphs(plot10,11,12)

stl(plot 10) and tslm(plot11) have similar pattern; tslm model however performs better and has a much smaller standard deviations. Auto arima(plot 12) model has a graph which looks like a cubic graph a bit. It has the least spreadout of plots.

Takeaway

Arima model(plot 12) performed the best, has the lowest MSE aswell as the least spreadout relative to the other models. The MSE increases as the forecasting period increase which makes sense. It is importnat to note Seasonal Naive graph which shows a totally different picture than any other, a decreasing error as the forecast range increases. It would be interesting to know how many forecasting period(“h”) we would need to get to get to the MSE band of auto arima.