Data Exploration and Decomposition

Time series components

A time series is a combination of systematic and random (noise) components. Only the systematic part can be forecasted.

random (noise) components

Noise is a series of independently distributed normal sequences: the sequence is non-correlationary, non-trend, and random. It obeys the normal distribution with mean of 0 and a variance of \(\sigma^2\).

Generating a Noise Series in R

  • set.seed(): Sets random number generator seed.

  • ts(rnorm(50))Creates a time series variable: 50 observations that are random and normally distributed Plots the time series.

set.seed(000929)
y <- ts(rnorm(50))
plot(y)

Systematic Time Series Components

  • Level: if a time series lacks trend/cycle then it fluctuates around a level. The level can be described as part of the trend component.

  • Trend: is the long term increase or decrease in the data.

  • Cycle: is a long term up and down movements - not of fixed period.

  • Season: is a repeating pattern of fixed period.

Irregular Components

  • Outliers: one-period shift from the expected value.

  • Level Shift: multi-period shifts from the expected value.

Different Decomposition Model Forms

Pure Additive Model:

\[Y_t=T_t+S_t+\epsilon_t\]

Pure Multiplicative Model:

\[Y_t=T_t*S_t*\epsilon_t\]

Mixed Multiplicative Model:

\[Y_t=T_t*S_t+\epsilon_t\]

Time Series Practice

For this exercise, you will have to load the packages “tsutils” and “forecast”. Go to the Packages tab on the right and tick on the checkbox next to “tsutils”, and also for the one next to “forecast”. Alternatively, you can also type the following statement:

library(tsutils)
library(forecast)

The package is now loaded, and it is time to load the data. Open the Notepad file called “Sales.txt”.

data <- read.table("./Sales.txt")

Given that the data is a time series, it would be useful to convert the data into a time series format. A time series format requires a seasonal frequency. In this case, since the data is monthly, the frequency would be 12. In order to do so, enter the statement:

data <- ts(data, frequency = 12, start = c(2016,1))

The above statement has converted the variable data into a time series. The frequency=12 specifies the seasonal frequency, while the start=c(2016,1) specifies that the first entry occurs on January 2016. The start is not a compulsory argument as it helps in organizing the data; however the frequency is!

If you wanted to inspect the seasonal frequency of your series, just type:

frequency(data)
[1] 12

The first task consisted of plotting the time series. This is done by using the plot() function as follows:

plot(data)

Plotting in R

For the ts format, when entering the plot() function, the output was a line graph. However for other formats the output is a scatterplot. In order to specify that you need a line graph, the following should be inserted:

# Create data
data1 <- 1:20
# Plot
plot(data1, type = "l")

  • By adding the type="l" argument, the plot will now be a line. Furthermore if you wanted to colour it, then another argument should be added as such:
plot(data1, type = "l", col = "blue")

Centred Moving Average

In order to create a CMA, the function cmav() from the “tsutils” package is used. To test this, create a variable that will have the CMA values stored in it as such:

data_cma <- cmav(data, ma = 12, fill = FALSE)
  • The first argument passed is which series is being used, which in your case is the variable called data.

  • The ma() argument specifies the length of the CMA. If this is omitted, then the default value is the seasonal frequency.

  • The fill=FALSE argument means that the first and last observations that were expected to be left blank will be left blank.You will see that the first values of the “data cma” variable are NA, since they don’t exist. If you wanted to impute values on these observations, then use the fill=TRUE argument.

If you wanted to overlay the graphs, this is how it is done:

#Plot the original time series in blue 
plot(data, col="blue")
#Plot the CMA(12) in red 
lines(data_cma, col = "red")

  • By adding the lines() statement, you can overlay a line graph on your original time series.

  • If you were to use the plot() statement, then a new graph will be plotted instead.

Seasonal Plots

In the “tsutils” package, a function has been created that will automatically calculate the seasonal indices and profile, and will display the seasonal plot. Just type:

seasplot(data)
Results of statistical testing
Evidence of trend: TRUE  (pval: 0)
Evidence of seasonality: TRUE  (pval: 0)

If you wished to stored the seasonal indices in a variable, say “data seas index”, then type:

data_seas_index <- seasplot(data)$season

data_seas_index
           1         2         3         4         5        6        7        8
s1 0.8993358 0.8708480 0.9894718 0.9700460 0.9785901 1.142859 1.253256 1.220492
s2 0.9045226 0.8526912 0.9954561 0.9629886 0.9739369 1.149342 1.258599 1.258054
s3 0.9060626 0.8414553 0.9435274 0.8961373 0.9330620 1.117414 1.258437 1.288265
s4 0.8761789 0.8231047 0.9758638 0.9511609 0.9988109 1.109283 1.272323 1.284688
s5 0.9138057 0.8474668 0.9006717 0.9822443 0.9984135 1.126217 1.303067 1.263477
          9        10        11        12
s1 1.061418 0.9065547 0.7957910 0.8893194
s2 1.085535 0.9317521 0.8182428 0.8992974
s3 1.276488 0.9005017 0.7691512 0.8280946
s4 1.057782 0.9229897 0.8119626 0.8987517
s5 1.054112 0.9520553 0.8016295 0.8837926

Decomposition

The decomp() function in the “tsutils” package can automatically decompose the trend, seasonal and irregular parts of a time series. Create a variable named “decomposition”:

decomposition <- decomp(data,decomposition = "multiplicative",outplot = TRUE)

  • The first argument is which series to decompose, which is the “data” series.

  • The decomposition argument specifies whether the decomposition should be additive or multiplicative. So if you wanted to try with an additive counterpart then alter the statement to: decomposition = "additive".

  • The outplot=TRUE specifies whether the plots should be displayed or not.

In order to see what it consists of, you need the str() function. This function will display the structure of the variable.

str(decomposition)
List of 5
 $ trend    : Time-Series [1:60, 1] from 2016 to 2021: 316 318 320 323 325 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "V1"
 $ season   : Time-Series [1:60, 1] from 2016 to 2021: 0.9 0.847 0.961 0.953 0.977 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "V1"
 $ irregular: Time-Series [1:60, 1] from 2016 to 2021: -0.204 7.55 9.122 5.656 0.659 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "y"
 $ f.season : NULL
 $ g        : NULL

In order to extract for example the irregular (which is the noise component) component, then run the following command:

decomposition$irregular
              Jan          Feb          Mar          Apr          May          Jun
2016  -0.20378923   7.54956895   9.12218012   5.65649295   0.65882745   4.52782124
2017   1.58157428   1.96903835  12.32303062   3.78473752  -0.95710562   7.46037922
2018   2.28208399  -2.13819903  -6.70296331 -21.89349087 -16.92358199  -4.51925521
2019  -9.77974348  -9.97553701   6.18472104  -0.56392283   9.35538295  -8.39929725
2020   6.30861469   0.16314467 -28.06435549  13.95275954  10.32998166  -1.33297963
              Jul          Aug          Sep          Oct          Nov          Dec
2016  -5.23331338 -14.10396706 -15.26778851  -5.47355933  -1.21383462   3.25787958
2017  -3.89300666  -1.83441017  -8.01344051   3.34481258   7.04025884   7.26560985
2018  -4.17470558   9.90581088  66.89304049  -8.87791824 -12.17354977 -21.06276219
2019   1.37239090   9.43913510 -21.57245775   0.09657099   5.62069859   8.51707043
2020  16.19650458   0.23119971 -25.52008498  14.18006067   1.10636910   1.92658343

Check the Decomposition model

A question can arise when using the decomp() function:

  • Is the Multiplicative decomposition performed by this function a Pure Multiplicative Model or a Mixed Multiplicative Model?

In order to confirm this, isolate the Regular Components from the decomposition. So far, all the data is stored in the variable called “decomposition”. The first step would be to extract the regular components (Trend and Seasonality) and multiply them:

#Extract the Trend
decomposition$trend
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep
2016 315.7886 318.0808 320.3730 322.6651 324.9573 327.2495 329.5417 331.8333 334.4583
2017 348.2500 353.0000 357.6250 361.3750 364.5000 367.1667 369.4583 371.2083 372.1667
2018 375.2500 377.9167 383.6667 388.3333 389.0417 389.2917 390.1667 392.0000 394.8333
2019 410.8750 415.5000 416.0417 416.3333 420.5000 425.5000 430.7083 435.1250 437.7083
2020 456.3333 461.3750 465.2083 469.3333 472.7500 475.0417 477.3352 479.6287 481.9221
          Oct      Nov      Dec
2016 337.5417 340.5417 344.0833
2017 372.4167 372.7500 373.6250
2018 398.6667 403.0417 406.9583
2019 440.9583 445.8333 450.6250
2020 484.2156 486.5090 488.8025
#Extract the Seasonality
decomposition$season
           Jan       Feb       Mar       Apr       May       Jun       Jul       Aug
2016 0.8999811 0.8471132 0.9609982 0.9525154 0.9765627 1.1290230 1.2691364 1.2629954
2017 0.8999811 0.8471132 0.9609982 0.9525154 0.9765627 1.1290230 1.2691364 1.2629954
2018 0.8999811 0.8471132 0.9609982 0.9525154 0.9765627 1.1290230 1.2691364 1.2629954
2019 0.8999811 0.8471132 0.9609982 0.9525154 0.9765627 1.1290230 1.2691364 1.2629954
2020 0.8999811 0.8471132 0.9609982 0.9525154 0.9765627 1.1290230 1.2691364 1.2629954
           Sep       Oct       Nov       Dec
2016 1.1070670 0.9227707 0.7993554 0.8798512
2017 1.1070670 0.9227707 0.7993554 0.8798512
2018 1.1070670 0.9227707 0.7993554 0.8798512
2019 1.1070670 0.9227707 0.7993554 0.8798512
2020 1.1070670 0.9227707 0.7993554 0.8798512
#Multiply the Trend and Seasonality
regular_components <- decomposition$trend * decomposition$season

This has now created a new variable called “regular components”, which is equal to \(Trend × Seasonality\).

Now that the Regular Component has been created, you can calculate the errors for the Mixed Multiplicative Model as:

mmm_errors <- data - regular_components

The last step is to compare these errors with the errors from the decomposition. Go to the console and type:

decomposition$irregular - mmm_errors
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016   0   0   0   0   0   0   0   0   0   0   0   0
2017   0   0   0   0   0   0   0   0   0   0   0   0
2018   0   0   0   0   0   0   0   0   0   0   0   0
2019   0   0   0   0   0   0   0   0   0   0   0   0
2020   0   0   0   0   0   0   0   0   0   0   0   0

You will see that the result is a vector of 0. This means that the Multiplicative option in this function is actually a Mixed Multiplicative Model.

Exponential Smoothing Methods with Level and Trend

This part aims at introducing data partitioning for forecasting methods fitting and assessment(Train Set and Test Set), the implementation of simple forecasting methods, exponential smoothing methods, as well as error metrics in order to compare these methods with each other (model selection).

The forecasting methods that will be used are:

The error measures that will be encountered are:

Loading the Data

The first step is to load the necessary libraries. For this session, we will use three libraries: forecast, smooth and tsutils.

4 time series only with Level and no trend or season are stored in that file, which are: ”Medium Noise”, ”High Noise”, ”Medium Noise with Level Shift”, ”High Noise with Level Shift”.

Level_Data <- read.csv("./Level_Data.csv")

In order to look at its structure, type:

str(Level_Data)
'data.frame':   48 obs. of  5 variables:
 $ Date                         : chr  "Jan-12" "Feb-12" "Mar-12" "Apr-12" ...
 $ Medium.Noise                 : num  100.8 91.5 102 96.6 95.5 ...
 $ High.Noise                   : num  106 116 110 90 109 ...
 $ Medium.Noise.with.Level.Shift: num  100.8 91.5 102 96.6 95.5 ...
 $ High.Noise.with.Level.Shift  : num  106 116 110 90 109 ...

For the remainder of this tutorial, you will work with one series. Pick the first series called Medium Noise and store it in a variable of type ts:

medium_noise <- ts(Level_Data[,2],frequency=12)
  • [,1] means that you are using the first column of the matrix Workshop3, which refers to the Medium Noise series.

Analysing the series

As usual, the first step is to visualise the series. Plot it:

plot(medium_noise)

The next step is to decompose it in order to check which components it consists of. Use either function decomp() from tsutils package or decompose() from stats package:

# plot the Additive Decomposition
decomp(medium_noise, decomposition="additive", outplot=TRUE)
$trend
        Jan       Feb       Mar       Apr       May       Jun       Jul       Aug
1  97.26088  97.41820  97.61484  97.86064  98.16789  98.55196  99.03208  99.63208
2 102.51708 101.99958 101.70875 101.75167 101.99333 102.44667 102.87083 102.65250
3  99.69292 100.12750 100.79250 101.04167 101.06292 100.55000 100.10375 100.08333
4 100.73500 100.76667 101.06500 101.37542 100.69833 100.55667 100.44319 100.35244
        Sep       Oct       Nov       Dec
1 100.93375 101.83458 102.31917 102.75792
2 101.13958  99.75625  99.63500  99.72042
3 100.45208 101.09167 101.03000 100.68625
4 100.27983 100.22175 100.17529 100.13812

$season
         Jan        Feb        Mar        Apr        May        Jun        Jul
1  3.6135296 -3.4154863  4.0197283 -2.7198472  0.7993813  0.3336774  2.2550366
2  3.6135296 -3.4154863  4.0197283 -2.7198472  0.7993813  0.3336774  2.2550366
3  3.6135296 -3.4154863  4.0197283 -2.7198472  0.7993813  0.3336774  2.2550366
4  3.6135296 -3.4154863  4.0197283 -2.7198472  0.7993813  0.3336774  2.2550366
         Aug        Sep        Oct        Nov        Dec
1 -1.0475880  0.5311874 -4.3735632  1.6201364  0.6393252
2 -1.0475880  0.5311874 -4.3735632  1.6201364  0.6393252
3 -1.0475880  0.5311874 -4.3735632  1.6201364  0.6393252
4 -1.0475880  0.5311874 -4.3735632  1.6201364  0.6393252

$irregular
           Jan          Feb          Mar          Apr          May          Jun
1  -0.07441132  -2.53270881   0.38543496   1.48920842  -3.42727266   0.85436557
2  -3.14061289   5.09590294  15.32152168   0.18818053   1.78728533  -1.55034408
3   1.92355377  -0.51201373 -12.59222832  -3.47181947   4.17770200   0.93632259
4   1.29147044  -2.05118040  -3.11472832   1.79443053  -2.53771467  -0.24034408
           Jul          Aug          Sep          Oct          Nov          Dec
1   7.07288007  -0.53449537  -3.18493739  -3.78102017   0.60069693  -5.21724188
2  -4.03586993  -8.70491204  -5.22077073   1.15731316   6.22486360   5.76025812
3  -2.51878660   5.54425463  -0.25327073   1.52189649   3.63986360  -6.32557521
4  -0.51822353   3.69515279   8.65897884   1.10181052 -10.46542414   5.78255897

$f.season
NULL

$g
NULL

# plot the Multiplicative Decomposition
decomp(medium_noise,decomposition="multiplicative", outplot=TRUE)
$trend
        Jan       Feb       Mar       Apr       May       Jun       Jul       Aug
1  97.26088  97.41820  97.61484  97.86064  98.16789  98.55196  99.03208  99.63208
2 102.51708 101.99958 101.70875 101.75167 101.99333 102.44667 102.87083 102.65250
3  99.69292 100.12750 100.79250 101.04167 101.06292 100.55000 100.10375 100.08333
4 100.73500 100.76667 101.06500 101.37542 100.69833 100.55667 100.44319 100.35244
        Sep       Oct       Nov       Dec
1 100.93375 101.83458 102.31917 102.75792
2 101.13958  99.75625  99.63500  99.72042
3 100.45208 101.09167 101.03000 100.68625
4 100.27983 100.22175 100.17529 100.13812

$season
        Jan       Feb       Mar       Apr       May       Jun       Jul       Aug
1 1.0363086 0.9654852 1.0397987 0.9730342 1.0076441 1.0034344 1.0228841 0.9901069
2 1.0363086 0.9654852 1.0397987 0.9730342 1.0076441 1.0034344 1.0228841 0.9901069
3 1.0363086 0.9654852 1.0397987 0.9730342 1.0076441 1.0034344 1.0228841 0.9901069
4 1.0363086 0.9654852 1.0397987 0.9730342 1.0076441 1.0034344 1.0228841 0.9901069
        Sep       Oct       Nov       Dec
1 1.0054381 0.9567071 1.0160520 1.0068200
2 1.0054381 0.9567071 1.0160520 1.0068200
3 1.0054381 0.9567071 1.0160520 1.0068200
4 1.0054381 0.9567071 1.0160520 1.0068200

$irregular
            Jan           Feb           Mar           Apr           May           Jun
1   0.007710648  -2.585821896   0.520220658   1.408252121  -3.378295792   0.849576339
2  -3.249336300   5.200915721  15.293374992   0.212148921   1.807020168  -1.568509286
3   1.917371938  -0.471615597 -12.583909458  -3.466996800   4.204549026   0.924671289
4   1.247452005  -1.988721530  -3.117254601   1.808253037  -2.508080731  -0.252018274
            Jul           Aug           Sep           Oct           Nov           Dec
1   7.061660702  -0.596414261  -3.202638608  -3.745866291   0.578406263  -5.278727784
2  -4.134935477  -8.736949676  -5.239591285   1.102489912   6.245659168   5.719488029
3  -2.554530047   5.486799996  -0.268352586   1.524887340   3.638266631  -6.372932308
4  -0.561734671   3.640360406   8.644833695   1.067140245 -10.453301267   5.738940162

$f.season
NULL

$g
NULL

# or do that using decompose for multiplicative:
plot(decompose(medium_noise, type="m"))

# and additive seasonality:
plot(decompose(medium_noise, type="a"))

  • As you may see, no regular components emerge for this series.

Splitting the Series

The next step is to split the data into training (in-sample) and test (out-of-sample, holdout) sets. The purpose of the former is to fit various methods only on a part instead of the whole data. Since the test set is withheld from the method fitting procedure, it is thus unknown to our forecasting methods, and can be used in assessing the accuracy of the methods.

The series must be split into training and test sets. For the sake of this example, the training set size will be equal to 36.

# find the total number of observations
medium_noise_length <- length(medium_noise)
# write down size of training set
train_length <- 36
# and the forecasting horizon
h <- 12
# create the training set
train <- ts(medium_noise[1:train_length], frequency=12)
# create the test set
test <- ts(medium_noise[(train_length+1):medium_noise_length],frequency=12)

Fitting a Naive

In order to calculate the Naive method, the naive() function from the forecast package can be used. It accepts horizon parameter h and produces forecasts straight away:

naive_method <- naive(train, h=h)

The forecasts of Naive method for the holdout can be extracted using $mean variable:

naive_forecast <- naive_method$mean

Fitting a Simple Moving Average

Recall that the forecast of a SMA of order \(\ell\) is given as:

\[ \hat{y}_{t+1|t}=\frac{1}{\ell}\sum^{t}_{i=t-\ell+1}y_i\]

where \(y_t\) is actual value on observation \(t\) and \(\hat{y}_{t+1|t}\) is one-step ahead forecast for the observation \(t+1\). In our case \(\ell = 3\), and this reduces to:

\[ \hat{y}_{t+1|t}=\frac{1}{3}\sum^{t}_{i=t-3+1}y_i\]

We will use function ma() from forecast package:

SMA <- ma(train, order=3, centre=FALSE)

Once the statement is executed, you will notice in the Environment that a variable called SMA has been created. This SMA output is defined as “Time Series” in the Environment. It contains the fitted values of SMA(3). As we remember, the forecast of SMA is a flat line, repeating the last fitted value. Let’s store the forecast from SMA in a variable:

# Firstly we get rid of NA (‘‘Not Assigned’’) values:
SMA_no_NAs <- SMA[!is.na(SMA)]
# Then form a forecast:
SMA3_forecast <- ts(rep(SMA_no_NAs[length(SMA_no_NAs)],12), frequency=12)

Error Measures

This section will utilize the previously created forecasts for the SMA to demonstrate how error measures can be calculated in R. Specifically, these metrics will be computed solely on the test set.

The first step consists of having your errors created. Recall that the errors are just the difference between the actuals and the forecasts \(e_t=y_t-\hat{y}_{t|t-1}\):

SMA3_errors <- test - SMA3_forecast

ME

Mean Error (ME): it is the average of the errors defined as:

\[ME = \frac{1}{m}\sum^{m}_{i=1}e_t\] The ME can be calculated as:

SMA3_ME <- mean(SMA3_errors)

MSE

Mean Squared Error (MSE): This is the average of the squared errors defined as:

\[MSE = \frac{1}{m}\sum^{m}_{i=1}e_t^2\]

The MSE can be calculated as:

SMA3_MSE <- mean(SMA3_errors ^ 2)

MAE

Mean Absolute Error (MAE): This is equal to the average of the absolute value of errors defined as:

\[MAE = \frac{1}{m}\sum^{m}_{i=1}|e_t|\]

The MAE can be calculated as:

SMA3_MAE <- mean(abs(SMA3_errors))

MAPE

Mean Absolute Percentage Error (MAPE): The MAPE is expressed in percentages, thus the multiplication by 100. It is defined as:

\[MAPE = 100 * \frac{1}{m}\sum^{m}_{i=1}\frac{|e_t|}{|y_t|}\]

The MAPE can be calculated as:

SMA3_MAPE <- 100 * mean(abs(SMA3_errors)/test)

Fitting Simple Exponential Smoothing (Level)

The last method is Simple Exponential Smoothing (SES) method. The SES is given by:

\[\hat{y}_{t+1|t} = \alpha y_t+(1-\alpha)\hat{y}_{t|t-1} \]

where \(\alpha\) is the smoothing parameter, which is usually assumed to lie in [0, 1] region.

The final method, through which R possesses a considerable superiority in producing over Excel, is Exponential Smoothing. For this example, since there were no regular components in our time series, the Simple Exponential Smoothing will be used. There is ets() function in forecast package and es() function in smooth package. They both fit exponential smoothing in a so called “State-space form”, but have some differences in the behaviour and available features.

ets() function

As you recall, the SES is defined by the \(\alpha\) parameter. If you want to compute a SES with \(\alpha\), for example, predefined \(\alpha= 0.15\) , then you would type the following:

ETS_ANN_0.15 <- ets(train, model="ANN", alpha=0.15)
ETS_ANN_0.15
ETS(A,N,N) 

Call:
 ets(y = train, model = "ANN", alpha = 0.15) 

  Smoothing parameters:
    alpha = 0.15 

  Initial states:
    l = 99.2358 

  sigma:  6.1351

     AIC     AICc      BIC 
261.5586 261.9223 264.7257 
  • The first parameter is the data to which SES is applied.

  • The model refers to the Error Trend Seasonality taxonomy. The Error can be either Additive (A) or Multiplicative (M). Since there is no Trend or Seasonality, they are represented by the letter ‘N’, which stands for “None”.

  • So if you want to try to fit the same SES with Multiplicative error, then you would substitute the model = “ANN” part with model = “MNN”.

  • The last parameter involved is the α which is set to 0.15. If you wanted the \(\alpha\) parameter to be optimised, then this argument should be omitted: ETS_ANN_opt <- ets(train, "ANN")

ETS_MNN_0.15 <- ets(train, "MNN", alpha=0.15)
ETS_MNN_0.15
ETS(M,N,N) 

Call:
 ets(y = train, model = "MNN", alpha = 0.15) 

  Smoothing parameters:
    alpha = 0.15 

  Initial states:
    l = 99.1685 

  sigma:  0.0611

     AIC     AICc      BIC 
261.5753 261.9389 264.7423 

ETS_ANN_opt
ETS(A,N,N) 

Call:
 ets(y = train, model = "ANN") 

  Smoothing parameters:
    alpha = 1e-04 

  Initial states:
    l = 100.6021 

  sigma:  5.8221

     AIC     AICc      BIC 
259.7885 260.5385 264.5391 

In order to retrieve the \(\alpha\) parameter, the following statement could be used:

coef(ETS_ANN_opt)
       alpha            l 
1.000169e-04 1.006021e+02 
  • This will return a vector with \(\alpha\) parameter as a first element and the initial value of SES as a second.

Now we need to produce forecasts. In order to generate them, you will need to use forecast() function in similar fashion to the Naive method. The below example will be used for the ETS ANN 0.15 method created in the beginning, where the \(\alpha\) was preset at 0.15. Finally, we can plot the data, fitted values and the forecast using the following command:

ETS_ANN_0.15_forecast <- forecast(ETS_ANN_0.15, h=h)$mean
plot(forecast(ETS_ANN_0.15, h=h))

es() function

As we have pointed out before, there is another exponential smoothing function, called es(). It has a similar syntax, but produces more information. In order to obtain forecast similar to ETS ANN 0.15, we can type:

ES_ANN_0.15 <- es(train, "ANN", persistence=0.15, h=h)
ES_ANN_0.15
Time elapsed: 0.02 seconds
Model estimated: ETS(ANN)
Persistence vector g:
alpha 
 0.15 
Initial values were optimised.

Loss function type: likelihood; Loss function value: 115.3578
Error standard deviation: 6.1351
Sample size: 36
Number of estimated parameters: 2
Number of provided parameters: 1
Number of degrees of freedom: 34
Information criteria:
     AIC     AICc      BIC     BICc 
234.7155 235.0791 237.8825 238.5341 
  • Here persistence parameter includes the vector of smoothing parameters, which in our case consists of only one value \(\alpha\).

The optimised SES can be constructed in a similar to ets() way, and the parameters can also be retrieved via coef() function:

ES_ANN_opt <- es(train, "ANN", h=h, silent="a")
coef(ES_ANN_opt)
   alpha    level 
  0.0000 100.6017 

And in case you need a forecast of a different length (let’s say 18 observations ahead), you can use forecast() function:

ES_ANN_opt_forecast <- forecast(ES_ANN_opt, h=18)$mean

Initialising Seeds

The aim of this task is to analyse the impact of different initial values on final forecasts for the SES. However, when you used the ets() and es(), the initial value was no longer an issue. That is because the both these functions treat the seed as an unknown parameter, and thus optimise it along with the smoothing parameter.

The ets() function does not permit the user to input the initial value, and so for this exercise the es() function will be used instead:

# Fit SES with fixed initial seed
es_ANN_initial_1 <- es(medium_noise, model="ANN", initial=medium_noise[1],
                       h=h, holdout=TRUE)
  • In es() function if the initial is set to a value, then it will not be optimised, and instead the function will use the number provided by the user.

  • medium noise[1] is the first entry in the training set, and so the model is initialised with this fixed value.

  • The parameter holdout in es() function defines, if the function needs to split data in two parts and produce forecast on the second (test set). If this parameter is set to TRUE, then the function will return forecast errors on the test set. They are stored in $accuracy variable:

es_ANN_initial_1$accuracy
           ME           MAE           MSE           MPE          MAPE           sCE 
 2.383333e-01  3.765000e+00  2.318078e+01  6.058165e-05  3.750751e-02  2.842895e-02 
         sMAE          sMSE          MASE         RMSSE          rMAE         rRMSE 
 3.742483e-02  2.290434e-03  6.170397e-01  6.161206e-01  5.661654e-01  6.237261e-01 
         rAME     asymmetry          sPIS 
 3.947005e-02  9.463655e-02 -1.947284e-01 

So far, you have calculated a SES on the data with a fixed seed. This will be different than in a case of SES with the optimised initial value:

# Fit SES with optimised Seed
es_ANN_opt <- es(medium_noise, model="ANN", h=h, holdout=TRUE)
  • The statement passed is the same as the previous one, with the exception that the initial argument is by default is set to NULL. By default, the es() function optimises the seed as a parameter. Just as before, let us see how the model performed:
es_ANN_opt$accuracy
          ME          MAE          MSE          MPE         MAPE          sCE 
 0.436666667  3.773055556 23.314658334  0.002028055  0.037513751  0.052086612 
        sMAE         sMSE         MASE        RMSSE         rMAE        rRMSE 
 0.037504901  0.002303662  0.618359920  0.617897209  0.567376775  0.625524632 
        rAME    asymmetry         sPIS 
 0.072315761  0.178231642 -0.348503173 

Benchmarking

A crucial practice that is sometimes overlooked in forecasting is to benchmark your forecasts with a simple forecasting method. The easiest method to use as a benchmark is the Naive. Here we will use es() function for the same purposes, recalling features of SES:

# Fit SES with optimised Seed
medium_noise_naive <- es(medium_noise, model="ANN", persistence=1,
                         h=h, holdout=TRUE)
  • this way we set smoothing parameter equal to one, which makes ETS(A,N,N) model produce Naive forecasts.
medium_noise_naive$accuracy
          ME          MAE          MSE          MPE         MAPE          sCE 
 6.038333333  6.650000000 59.585450000  0.057596778  0.064294103  0.720266397 
        sMAE         sMSE         MASE        RMSSE         rMAE        rRMSE 
 0.066102285  0.005887486  1.089857651  0.987806359  1.000000000  1.000000000 
        rAME    asymmetry         sPIS 
 1.000000000  0.910667086 -4.691671775 

Fitting Exponential Smoothing Models (Trend)

In cases, when there is a trend in data, underlying the SES, cannot be used for forecasting. In this case we need a forecasting method which can produce trends. One of these methods is Holt’s method. It has underlying statistical model ETS(A,A,N) (Additive error, Additive trend and No seasonality).

In order to see how this method works, we will use a different time series. Copy the file called “trend data.csv” to your working directory and load it into R. Save the resulting time series into trend data variable.

trend_data <- read.csv("./trend_data.csv")
trend_data <- ts(trend_data,frequency = 12)
plot(trend_data)

Note that there is no obvious seasonality in the data, which implies that an ETS model with no seasonality can be used. The data consists of 48 observations. As usual, we will use 36 of them for in-sample and 12 for the holdout.

h=12
trend_data_length <- length(trend_data)
# Split into training and test
trend_data_train <- ts(trend_data[1:36], frequency = 12)
trend_data_test <- ts(trend_data[37:trend_data_length],frequency = 12)

Now in order to estimate the parameters and the initial states of this model, we will use the ets() function:

# calculate Holt Method
ets_AAN <- ets(trend_data_train, model="AAN")
ets_AAN
ETS(A,A,N) 

Call:
 ets(y = trend_data_train, model = "AAN") 

  Smoothing parameters:
    alpha = 0.1931 
    beta  = 0.1931 

  Initial states:
    l = 1022.1387 
    b = 22.784 

  sigma:  36.1903

     AIC     AICc      BIC 
393.1595 395.1595 401.0771 
  • Both smoothing parameters, \(\alpha\) and \(\beta\), are reported to be equal to 0.19.

In case you need these parameters, you can use coef() function:

coef(ets_AAN)
       alpha         beta            l            b 
   0.1930941    0.1930940 1022.1386524   22.7839859 

Just like any other forecasting model constructed using the ets() function in R, the forecasts are produced by using the forecast function:

forecast(ets_AAN, h=h)$mean
       Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep
4 2027.636 2055.188 2082.741 2110.294 2137.846 2165.399 2192.951 2220.504 2248.056
       Oct      Nov      Dec
4 2275.609 2303.161 2330.714
plot(forecast(ets_AAN, h=h))

Damped Trend

If you have plotted the data, you may have noticed that the forecast produced by ETS(A,A,N) is biased (it overshoots the data in the holdout). This is a common thing in forecasting, so in order to address this issue a damped trend method has been proposed. The underlying statistical model is ETS(A,Ad,N).

In order to fit the Damped trend method, the same syntax as before can be used, but with an additional parameter:

# Calculate a Damped Holt Method
ets_AAdN <- ets(trend_data_train, model="AAN", damped=TRUE)
ets_AAdN
ETS(A,Ad,N) 

Call:
 ets(y = trend_data_train, model = "AAN", damped = TRUE) 

  Smoothing parameters:
    alpha = 0.1927 
    beta  = 0.1927 
    phi   = 0.98 

  Initial states:
    l = 1015.4688 
    b = 26.5557 

  sigma:  36.7087

     AIC     AICc      BIC 
395.0406 397.9372 404.5417 
  • a new parameter \(\phi\), representing the damping parameter, which is equal to 0.98.

  • Please note that the default argument for the ets() function is damped=NULL, meaning that if not specified, the function will have to choose between damped = TRUE and damped = FALSE, based on an Information Criterion.

The syntax for es() function is different, as it accepts “d” right in the name of the model the following way:

es_AAdN <- es(trend_data, model="AAdN", h=h, holdout=TRUE)
forecast(es_AAdN, h=h)$mean
       Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep
4 2026.260 2053.489 2080.387 2106.959 2133.207 2159.137 2184.751 2210.055 2235.051
       Oct      Nov      Dec
4 2259.743 2284.135 2308.231
plot(forecast(es_AAdN, h=h))

Notation for Exponential Smoothing Models

ETS:Error, Trend, Seasonality;

  • for N: None,

  • for A Additive,

  • for Ad Additive damped,

  • for M Multiplicative,

  • for Md Multiplicative damped.

Examples:

  • ETS(A,N,N) Additive error, no trend, no seasonality

  • ETS(A,A,M) Additive error, additive trend, multiplicative seasonality

Fitting Exponential Smoothing Models (Trend and Season)

In this subsection, you will see how to calculate an Additive Holt-Winters method. Recall that this method involves the presence of both trend and seasonality in the series.

Copy the trend seasonal data.csv file into your Working directory, and plot the data in order to analyse it visually.


trend_seasonal_data <- ts(trend_seasonal_data, frequency = 12)
plot(trend_seasonal_data)

  • The plot indicates that there are both trend and seasonality in the data. This means that a Holt-Winters method can be used.

Split the data into two sets with 36 and 12 observations:

# find the total number of observations
trend_seasonal_data_length <- length(trend_seasonal_data)
# write down size of training set
train_length <- 36
# and the forecasting horizon
h <- 12
# create the training set
trend_seasonal_data_train <- ts(trend_seasonal_data[1:train_length], frequency=12)
# create the test set
trend_seasonal_data_test <- ts(trend_seasonal_data[
  (train_length+1):trend_seasonal_data_length],frequency=12)

The additive Holt-Winters method has an underlying ETS(A,A,A) model. There are other types of trend-seasonal models in ETS: ETS(A,Ad,A), ETS(A,A,M), ETS(M,A,M) and so on. Constructing a seasonal model is straightforward:

# Fit a model using ets():
ets_AAA <- ets(trend_seasonal_data_train, model="AAA", damped=FALSE)
ets_AAA
ETS(A,A,A) 

Call:
 ets(y = trend_seasonal_data_train, model = "AAA", damped = FALSE) 

  Smoothing parameters:
    alpha = 0.2039 
    beta  = 0.1948 
    gamma = 1e-04 

  Initial states:
    l = 611.8942 
    b = 47.8868 
    s = -347.5472 -255.5593 433.4323 -360.1945 188.8925 93.2961
           242.9259 201.1537 224.9884 108.9299 -409.7781 -120.5396

  sigma:  22.7599

     AIC     AICc      BIC 
366.8465 400.8465 393.7663 
# Do the same using es():
es_AAA <- es(trend_seasonal_data_train, model="AAA", h=h)
es_AAA
Time elapsed: 0.08 seconds
Model estimated: ETS(AAA)
Persistence vector g:
 alpha   beta  gamma 
0.2082 0.2082 0.0073 
Initial values were optimised.

Loss function type: likelihood; Loss function value: 147.7294
Error standard deviation: 20.7227
Sample size: 36
Number of estimated parameters: 18
Number of degrees of freedom: 18
Information criteria:
     AIC     AICc      BIC     BICc 
331.4588 371.6941 359.9621 432.0541 
  • If you look at the output of these function, you will notice that there is a third smoothing parameter now, called \(\gamma\) (used for seasonal part of the model) and that now there is a set of initial seasonal components.

  • Overall ETS(A,A,A) has 18 parameters (3 smoothing parameters, 12 initial seasonal indices, 2 initials for level and trend and an estimate of standard deviation of the error).

Selecting the best model based on optimisation

So far, you have seen how to fit specific models using ets() and es() functions in R. Both of them also allow selecting the most appropriate model for the data. This is done via fitting a set of them and selecting the one that has a minimal Information Criterion. This can either be done using AICc, AIC or BIC. By default both ets() and es() use AICc.

In order to use this feature, you need to set model argument equal to "ZZZ":

# Calculate an Optimized ETS Method using ets()
ets_ZZZ <- ets(trend_seasonal_data_train, model="ZZZ")
ets_ZZZ
ETS(A,A,A) 

Call:
 ets(y = trend_seasonal_data_train, model = "ZZZ") 

  Smoothing parameters:
    alpha = 0.2039 
    beta  = 0.1948 
    gamma = 1e-04 

  Initial states:
    l = 611.8942 
    b = 47.8868 
    s = -347.5472 -255.5593 433.4323 -360.1945 188.8925 93.2961
           242.9259 201.1537 224.9884 108.9299 -409.7781 -120.5396

  sigma:  22.7599

     AIC     AICc      BIC 
366.8465 400.8465 393.7663 
# Do the same using es()
es_ZZZ <- es(trend_seasonal_data_train, model="ZZZ")
es_ZZZ
Time elapsed: 0.73 seconds
Model estimated: ETS(AMdA)
Persistence vector g:
 alpha   beta  gamma 
0.2592 0.0000 0.0000 
Damping parameter: 0.9608
Initial values were optimised.

Loss function type: likelihood; Loss function value: 147.9314
Error standard deviation: 20.2802
Sample size: 36
Number of estimated parameters: 17
Number of provided parameters: 2
Number of degrees of freedom: 19
Information criteria:
     AIC     AICc      BIC     BICc 
329.8627 363.8627 356.7826 417.7024 
  • By setting the model argument equal to "ZZZ", you ask function to explore many models and return the optimal one. In our case, the optimal model for the data is ETS(A,Md,A).

If you are sure that some components have a specific type in your data, you can pass them in the function. For example, we can be certain that there is no seasonality in the trend data, that we used in previous sections. So we can ask to exclude different types of seasonal models from the set of models (this should speed up the calculations and will result in a non-seasonal ETS model.):

# Select the most appropriate non-seasonal model with ets()
ets_ZZN <- ets(trend_data_train, model="ZZN")
ets_ZZN
ETS(M,A,N) 

Call:
 ets(y = trend_data_train, model = "ZZN") 

  Smoothing parameters:
    alpha = 0.1804 
    beta  = 0.1804 

  Initial states:
    l = 1020.0086 
    b = 25.6762 

  sigma:  0.0249

     AIC     AICc      BIC 
392.2672 394.2672 400.1848 
# The same thing with es()
es_ZZN <- es(trend_data_train, model="ZZN", silent="a")
Warning: The parameter phi is equal to one, so we reverted to the non-damped version of the model
es_ZZN
Time elapsed: 0.34 seconds
Model estimated: ETS(MMN)
Persistence vector g:
 alpha   beta 
0.3199 0.0000 
Initial values were optimised.

Loss function type: likelihood; Loss function value: 177.4165
Error standard deviation: 0.0247
Sample size: 36
Number of estimated parameters: 4
Number of provided parameters: 1
Number of degrees of freedom: 32
Information criteria:
     AIC     AICc      BIC     BICc 
362.8330 364.1234 369.1671 371.4791 

Rolling origin

So far, we have produced forecasts from one and the same point in time, from the same origin. The technique that we used is called “static origin”. The idea behind “rolling origin” is to produce several forecasts by moving the origin of the forecast.

This is an iterative procedure and on each iteration the train set size is increased by one observation, while the test set size is decreased by one.

Let’s try this technique on the example of “medium_noise” data:

# Load the Data
medium_noise <- read.csv("./medium_noise.csv", header=FALSE)
# Convert to Time Series
medium_noise <- ts(medium_noise, frequency=12, start=c(2012,1))
plot(medium_noise)

We will produce ten forecasts using ETS(A,N,N) model with the horizon of 12. First, write down the length of the series (48). Now we need to figure out the lengths of train and test sets. In order to do that you can use the following formulae:

\[T_0=T-h-n+1\]

\[T_1=h+n-1\]

where \(T_0\) is the size of train set, \(T_1\) is the size of test set, \(h\) is forecasting horizon and \(n\) is number of origins.

Now let us set training and test sets, and we do not need to preserve the time series structure of the test set, so we can simply save it as a vector:

# Set horizon and number of rolling origins
h <- 12
origins <- 10
medium_noise_length <- length(medium_noise)
train_length <- medium_noise_length - h - origins + 1
test_length <- h + origins - 1
medium_noise_train <- ts(medium_noise[1:train_length],
                         frequency = frequency(medium_noise),
                         start = start(medium_noise))
medium_noise_test <- medium_noise[(train_length+1):medium_noise_length]

Now that we managed to split the data into two parts, we will prepare matrix for forecasts and for the holdout values, so that we can easily calculate any error we want. In order to remember, what is what, we will give names to the rows and columns:

medium_noise_forecasts <- matrix(NA, nrow=origins, ncol=h)
medium_noise_holdout <- matrix(NA, nrow=origins, ncol=h)
colnames(medium_noise_forecasts) <- paste0("horizon",c(1:h))
rownames(medium_noise_forecasts) <- paste0("origin",c(1:origins))
dimnames(medium_noise_holdout) <- dimnames(medium_noise_forecasts)

Finally we can start producing forecasts using rolling origin. We will need a loop for that:

for(i in 1:origins){
# Create a ts object out of the medium_noise data
    our_train_set <- ts(medium_noise[1:(train_length+i-1)],
                        frequency=frequency(medium_noise),
                        start=start(medium_noise))
# Write down the holdout values from the test set
    medium_noise_holdout[i,] <- medium_noise_test[i-1+(1:h)]
# Produce forecasts and write them down
    medium_noise_forecasts[i,] <- forecast(ets(our_train_set,"ANN"),h=h)$mean
}

After that we will have two matrices with forecasts and holdouts. You can then work with them, calculating different types of errors. For example, MAE for each horizon can be calculated following way:

colMeans(abs(medium_noise_holdout - medium_noise_forecasts))
 horizon1  horizon2  horizon3  horizon4  horizon5  horizon6  horizon7  horizon8 
 3.641481  3.570178  3.148363  3.049997  3.170479  2.735337  2.897582  2.865993 
 horizon9 horizon10 horizon11 horizon12 
 3.200920  2.958815  3.443546  3.500544 

Now we have these two matrices and can calculate different errors and analyse how our fore- casting methods perform on different horizons. We can also use other statistical methods you may know (calculate mean, median, quantiles, standard deviation etc) for that.

---
title: "570 Forecasting & Predictive Analytics"
output: html_notebook
---
# Data Exploration and Decomposition

## Time series components
A time series is a combination of systematic and random (noise) components. Only the systematic part can be forecasted.

### random (noise) components

Noise is a series of independently distributed normal sequences: the sequence is non-correlationary, non-trend, and random. It obeys the normal distribution with mean of 0 and a variance of $\sigma^2$.

Generating a Noise Series in R

- `set.seed()`: Sets random number generator seed.

- `ts(rnorm(50))`Creates a time series variable: 50 observations that are **random** and **normally** distributed Plots the time series.

```{r}
set.seed(000929)
y <- ts(rnorm(50))
plot(y)
```

### Systematic Time Series Components

- Level: if a time series lacks trend/cycle then it fluctuates around a level. The level can be described as part of the trend component.

- Trend: is the long term increase or decrease in the data.

- Cycle: is a long term up and down movements - not of fixed period.

- Season: is a repeating pattern of fixed period.

### Irregular Components

- Outliers: one-period shift from the expected value.

- Level Shift: multi-period shifts from the expected value.

## Different Decomposition Model Forms

Pure Additive Model:

$$Y_t=T_t+S_t+\epsilon_t$$

Pure Multiplicative Model:

$$Y_t=T_t*S_t*\epsilon_t$$

Mixed Multiplicative Model:

$$Y_t=T_t*S_t+\epsilon_t$$

## Time Series Practice

For this exercise, you will have to load the packages “*tsutils*” and “*forecast*”. Go to the Packages tab on the right and tick on the checkbox next to “*tsutils*”, and also for the one next to “*forecast*”. Alternatively, you can also type the following statement:

```{r}
library(tsutils)
library(forecast)
```

The package is now loaded, and it is time to load the data. Open the Notepad file called “`Sales.txt`”.

```{r}
# Importing the data
data <- read.table("./Sales.txt")
```

Given that the data is a **time series**, it would be useful to *convert the data into a time series format*. A time series format requires a seasonal frequency. In this case, since the data is **monthly**, the frequency would be **12**. In order to do so, enter the statement:

```{r}
data <- ts(data, frequency = 12, start = c(2016,1))
```

The above statement has converted the variable data into a time series. The `frequency=12` specifies the seasonal frequency, while the `start=c(2016,1)` specifies that the first entry occurs on January 2016. The start is not a compulsory argument as it helps in organizing the data; however the frequency is!

If you wanted to inspect the seasonal frequency of your series, just type:

```{r}
frequency(data)
```

The first task consisted of plotting the time series. This is done by using the `plot()` function as follows:

```{r}
plot(data)
```

### Plotting in R

For the ts format, when entering the `plot()` function, the output was a line graph. However for other formats the output is a scatterplot. In order to specify that you need a line graph, the following should be inserted:

```{r}
# Create data
data1 <- 1:20
# Plot
plot(data1, type = "l")
```

- By adding the `type="l"` argument, the plot will now be a line. Furthermore if you wanted to colour it, then another argument should be added as such:

```{r}
plot(data1, type = "l", col = "blue")
```

### Centred Moving Average

In order to create a CMA, the function `cmav()` from the “*tsutils*” package is used. To test this, create a variable that will have the CMA values stored in it as such:

```{r}
data_cma <- cmav(data, ma = 12, fill = FALSE)
```

- The first argument passed is which series is being used, which in your case is the variable called `data`.

- The `ma()` argument specifies the length of the CMA. If this is omitted, then the default value is the seasonal frequency.

- The `fill=FALSE` argument means that the **first and last observations that were expected to be left blank** will be left blank.You will see that the first values of the “data cma” variable are NA, since they don’t exist. If you wanted to impute values on these observations, then use the `fill=TRUE` argument.

If you wanted to overlay the graphs, this is how it is done:

```{r}
#Plot the original time series in blue 
plot(data, col="blue")
#Plot the CMA(12) in red 
lines(data_cma, col = "red")
```

- By adding the `lines()` statement, you can overlay a line graph on your original time series. 

- If you were to use the `plot()` statement, then a new graph will be plotted instead.

### Seasonal Plots

In the “*tsutils*” package, a function has been created that will automatically calculate the seasonal indices and profile, and will display the seasonal plot. Just type:

```{r}
seasplot(data)
```

If you wished to stored the seasonal indices in a variable, say *“data seas index”*, then type: 

```{r}
data_seas_index <- seasplot(data)$season
data_seas_index
```

### Decomposition
The `decomp()` function in the “tsutils” package can automatically decompose the **trend**, **seasonal** and **irregular** parts of a time series. Create a variable named “decomposition”:

```{r}
decomposition <- decomp(data,decomposition = "multiplicative",outplot = TRUE)
```

- The first argument is which series to decompose, which is the “data” series.

- The `decomposition` argument specifies whether the decomposition should be **additive** or **multiplicative**. So if you wanted to try with an additive counterpart then alter the statement to: `decomposition = "additive"`.

- The `outplot=TRUE` specifies whether the plots should be displayed or not.

In order to see what it consists of, you need the `str()` function. This function will display the structure of the variable.

```{r}
str(decomposition)
```

In order to extract for example the **irregular** (which is the noise component) component, then run the following command:

```{r}
decomposition$irregular
```

### Check the Decomposition model

A question can arise when using the `decomp()` function: 

- Is the Multiplicative decomposition performed by this function a Pure Multiplicative Model or a Mixed Multiplicative Model?

In order to confirm this, isolate the **Regular Components** from the decomposition. So far, all the data is stored in the variable called “decomposition”. The first step would be to extract the **regular components** (*Trend and Seasonality*) and multiply them:

```{r}
#Extract the Trend
decomposition$trend
#Extract the Seasonality
decomposition$season
#Multiply the Trend and Seasonality
regular_components <- decomposition$trend * decomposition$season
```

This has now created a new variable called “regular components”, which is equal to $Trend × Seasonality$.

Now that the Regular Component has been created, you can calculate the errors for the **Mixed Multiplicative Model** as:

```{r}
mmm_errors <- data - regular_components
```

The last step is to **compare these errors with the errors from the decomposition**. Go to the console and type:

```{r}
decomposition$irregular - mmm_errors
```

You will see that the result is a vector of 0. This means that the Multiplicative option in this function is actually a **Mixed Multiplicative Model**.

# Exponential Smoothing Methods with Level and Trend
This part aims at introducing data partitioning for forecasting methods fitting and assessment(**Train Set and Test Set**), the implementation of *simple forecasting methods*, *exponential smoothing methods*, as well as *error metrics* in order to compare these methods with each other (model selection).

The forecasting methods that will be used are:

- Simple Moving Average (SMA);

- Naive Forecast;

- Single (or Simple) Exponential Smoothing (SES).

The error measures that will be encountered are:

- Mean Error (ME);

- Mean Squared Error (MSE);

- Mean Absolute Error (MAE);

- Mean Absolute Percentage Error (MAPE).

## Loading the Data

The first step is to load the necessary libraries. For this session, we will use three libraries: *forecast*, *smooth* and *tsutils*.

```{r include=FALSE}
library(forecast)
library(smooth)
library(tsutils)
```

4 time series only with Level and no trend or season are stored in that file, which are: ”Medium Noise”, ”High Noise”, ”Medium Noise with Level Shift”, ”High Noise with Level Shift”.

```{r}
Level_Data <- read.csv("./Level_Data.csv")
```

In order to look at its structure, type:

```{r}
str(Level_Data)
```

For the remainder of this tutorial, you will work with one series. Pick the first series called Medium Noise and store it in a variable of type `ts`:

```{r}
medium_noise <- ts(Level_Data[,2],frequency=12)
```

- `[,1]` means that you are using the first column of the matrix Workshop3, which refers to the Medium Noise series.

## Analysing the series

As usual, the first step is to visualise the series. Plot it:

```{r}
plot(medium_noise)
```

The next step is to decompose it in order to check which components it consists of. Use either function `decomp()` from *tsutils* package or `decompose()` from *stats* package:

```{r}
# plot the Additive Decomposition
decomp(medium_noise, decomposition="additive", outplot=TRUE)
# plot the Multiplicative Decomposition
decomp(medium_noise,decomposition="multiplicative", outplot=TRUE)
```

```{r}
# or do that using decompose for multiplicative:
plot(decompose(medium_noise, type="m"))
# and additive seasonality:
plot(decompose(medium_noise, type="a"))

```

- As you may see, no regular components emerge for this series.

## Splitting the Series

The next step is to split the data into **training** (in-sample) and **test** (out-of-sample, holdout) sets. The purpose of the former is to fit various methods only on a part instead of the whole data. Since the test set is withheld from the method fitting procedure, it is thus unknown to our forecasting methods, and can be used in assessing the accuracy of the methods.

The series must be split into training and test sets. For the sake of this example, the training set size will be equal to 36.

```{r}
# find the total number of observations
medium_noise_length <- length(medium_noise)
# write down size of training set
train_length <- 36
# and the forecasting horizon
h <- 12
# create the training set
train <- ts(medium_noise[1:train_length], frequency=12)
# create the test set
test <- medium_noise[(train_length+1):medium_noise_length]
```

## Fitting a Naive

In order to calculate the Naive method, the `naive()` function from the forecast package can be used. It accepts horizon parameter h and produces forecasts straight away:

```{r}
naive_method <- naive(train, h=h)
```

The forecasts of Naive method for the holdout can be extracted using `$mean` variable:

```{r}
naive_forecast <- naive_method$mean
```

## Fitting a Simple Moving Average

Recall that the forecast of a SMA of order $\ell$ is given as:

$$ \hat{y}_{t+1|t}=\frac{1}{\ell}\sum^{t}_{i=t-\ell+1}y_i$$

where $y_t$ is actual value on observation $t$ and $\hat{y}_{t+1|t}$ is one-step ahead forecast for the observation $t+1$. In our case $\ell = 3$, and this reduces to:

$$ \hat{y}_{t+1|t}=\frac{1}{3}\sum^{t}_{i=t-3+1}y_i$$

We will use function `ma()` from forecast package:

```{r}
SMA <- ma(train, order=3, centre=FALSE)
```

Once the statement is executed, you will notice in the *Environment* that a variable called `SMA` has been created. This `SMA` output is defined as “Time Series” in the *Environment*. It contains the fitted values of SMA(3). As we remember, the forecast of `SMA` is a flat line, repeating the last fitted value. Let’s store the forecast from `SMA` in a variable:

```{r}
# Firstly we get rid of NA (‘‘Not Assigned’’) values:
SMA_no_NAs <- SMA[!is.na(SMA)]
# Then form a forecast:
SMA3_forecast <- ts(rep(SMA_no_NAs[length(SMA_no_NAs)],12), frequency=12)
```

### Error Measures

This section will utilize the previously created forecasts for the SMA to demonstrate how error measures can be calculated in R. Specifically, these metrics will be computed solely on the test set.

The first step consists of having your errors created. Recall that the errors are just the difference between the actuals and the forecasts $e_t=y_t-\hat{y}_{t|t-1}$:

```{r}
SMA3_errors <- test - SMA3_forecast
```

#### ME

**Mean Error (ME)**: it is the average of the errors defined as: 

$$ME = \frac{1}{m}\sum^{m}_{i=1}e_t$$
The ME can be calculated as:

```{r}
SMA3_ME <- mean(SMA3_errors)
```

#### MSE

**Mean Squared Error (MSE)**: This is the average of the squared errors defined as:

$$MSE = \frac{1}{m}\sum^{m}_{i=1}e_t^2$$

The MSE can be calculated as:

```{r}
SMA3_MSE <- mean(SMA3_errors ^ 2)
```

#### MAE

**Mean Absolute Error (MAE)**: This is equal to the average of the absolute value of errors defined as:

$$MAE = \frac{1}{m}\sum^{m}_{i=1}|e_t|$$

The MAE can be calculated as:

```{r}
SMA3_MAE <- mean(abs(SMA3_errors))
```

#### MAPE

**Mean Absolute Percentage Error (MAPE)**: The MAPE is expressed in percentages, thus the multiplication by 100. It is defined as:

$$MAPE = 100 * \frac{1}{m}\sum^{m}_{i=1}\frac{|e_t|}{|y_t|}$$

The MAPE can be calculated as:

```{r}
SMA3_MAPE <- 100 * mean(abs(SMA3_errors)/test)
```

## Fitting Simple Exponential Smoothing (Level)

The last method is Simple Exponential Smoothing (SES) method. The SES is given by:

$$\hat{y}_{t+1|t} = \alpha y_t+(1-\alpha)\hat{y}_{t|t-1} $$

where $\alpha$ is the smoothing parameter, which is usually assumed to lie in [0, 1] region.

The final method, through which R possesses a considerable superiority in producing over Excel, is Exponential Smoothing. For this example, since there were **no regular components** in our time series, the **Simple Exponential Smoothing** will be used. There is `ets()` function in *forecast* package and `es()` function in *smooth* package. They both fit exponential smoothing in a so called “State-space form”, but have some differences in the behaviour and available features.

### ets() function

As you recall, the SES is defined by the $\alpha$ parameter. If you want to compute a SES with $\alpha$, for example, predefined $\alpha= 0.15$ , then you would type the following:

```{r}
ETS_ANN_0.15 <- ets(train, model="ANN", alpha=0.15)
ETS_ANN_0.15
```

- The first parameter is the data to which SES is applied. 

- The `model` refers to the Error Trend Seasonality taxonomy. The Error can be either Additive (A) or Multiplicative (M). Since there is no Trend or Seasonality, they are represented by the letter ‘N’, which stands for “None”.

- So if you want to try to fit the same SES with Multiplicative error, then you would substitute the `model = “ANN”` part with `model = “MNN”`.

- The last parameter involved is the α which is set to 0.15. If you wanted the $\alpha$ parameter to be optimised, then this argument should be omitted: `ETS_ANN_opt <- ets(train, "ANN")`

```{r}
ETS_MNN_0.15 <- ets(train, "MNN", alpha=0.15)
ETS_MNN_0.15
```

```{r}
ETS_ANN_opt <- ets(train, "ANN")
ETS_ANN_opt
```

In order to retrieve the $\alpha$ parameter, the following statement could be used:

```{r}
coef(ETS_ANN_opt)
```

- This will return a vector with $\alpha$ parameter as a first element and the initial value of SES as a second.

Now we need to produce forecasts. In order to generate them, you will need to use `forecast()` function in similar fashion to the Naive method. The below example will be used for the `ETS ANN 0.15` method created in the beginning, where the $\alpha$ was preset at 0.15. Finally, we can plot the data, fitted values and the forecast using the following command:

```{r}
ETS_ANN_0.15_forecast <- forecast(ETS_ANN_0.15, h=h)$mean
plot(forecast(ETS_ANN_0.15, h=h))
```

### es() function

As we have pointed out before, there is another exponential smoothing function, called `es()`. It has a similar syntax, but produces more information. In order to obtain forecast similar to `ETS ANN 0.15`, we can type:

```{r}
ES_ANN_0.15 <- es(train, "ANN", persistence=0.15, h=h)
ES_ANN_0.15
```

- Here `persistence` parameter includes the vector of smoothing parameters, which in our case consists of only one value $\alpha$.

The optimised SES can be constructed in a similar to `ets()` way, and the parameters can also be retrieved via `coef()` function:

```{r}
ES_ANN_opt <- es(train, "ANN", h=h, silent="a")
coef(ES_ANN_opt)
```

And in case you need a forecast of a different length (let’s say 18 observations ahead), you can use `forecast()` function:

```{r}
ES_ANN_opt_forecast <- forecast(ES_ANN_opt, h=18)$mean
```

### Initialising Seeds

The aim of this task is to analyse the impact of different initial values on final forecasts for the SES. However, when you used the `ets()` and `es()`, the initial value was no longer an issue. That is because the both these functions treat the seed as an unknown parameter, and thus optimise it along with the smoothing parameter.

The `ets()` function does not permit the user to input the initial value, and so for this exercise the `es()` function will be used instead:

```{r}
# Fit SES with fixed initial seed
es_ANN_initial_1 <- es(medium_noise, model="ANN", initial=medium_noise[1],
                       h=h, holdout=TRUE)
```

- In `es()` function if the `initial` is set to a value, then it will not be optimised, and instead the function will use the number provided by the user. 

- `medium noise[1]` is the first entry in the training set, and so the model is initialised with this fixed value.

- The parameter `holdout` in `es()` function defines, if the function needs to split data in two parts and produce forecast on the second (test set). If this parameter is set to `TRUE`, then the function will return forecast errors on the test set. They are stored in `$accuracy` variable:

```{r}
es_ANN_initial_1$accuracy
```

So far, you have calculated a SES on the data with a fixed seed. This will be different than in a case of SES with the optimised initial value:

```{r}
# Fit SES with optimised Seed
es_ANN_opt <- es(medium_noise, model="ANN", h=h, holdout=TRUE)
```

- The statement passed is the same as the previous one, with the exception that the `initial` argument is by default is set to `NULL`. By default, the `es()` function optimises the seed as a parameter. Just as before, let us see how the model performed:

```{r}
es_ANN_opt$accuracy
```

### Benchmarking

A crucial practice that is sometimes overlooked in forecasting is to benchmark your forecasts with a simple forecasting method. The easiest method to use as a benchmark is the Naive. Here we will use `es()` function for the same purposes, recalling features of SES:

```{r}
# Fit SES with optimised Seed
medium_noise_naive <- es(medium_noise, model="ANN", persistence=1,
                         h=h, holdout=TRUE)
```

- this way we set smoothing parameter equal to one, which makes ETS(A,N,N) model produce Naive forecasts.

```{r}
medium_noise_naive$accuracy
```

## Fitting Exponential Smoothing Models (Trend)

In cases, when there is a trend in data, underlying the SES, cannot be used for forecasting. In this case we need a forecasting method which can produce trends. One of these methods is Holt’s method. It has underlying statistical model ETS(A,A,N) (Additive error, Additive trend and No seasonality).

In order to see how this method works, we will use a different time series. Copy the file called "**trend data.csv**" to your working directory and load it into R. Save the resulting time series into trend data variable.

```{r}
trend_data <- read.csv("./trend_data.csv")
trend_data <- ts(trend_data,frequency = 12)
plot(trend_data)
```

Note that there is no obvious seasonality in the data, which implies that an ETS model with no seasonality can be used. The data consists of 48 observations. As usual, we will use 36 of them for in-sample and 12 for the holdout.

```{r}
h=12
trend_data_length <- length(trend_data)
# Split into training and test
trend_data_train <- ts(trend_data[1:36], frequency = 12)
trend_data_test <- ts(trend_data[37:trend_data_length],frequency = 12)
```

Now in order to estimate the parameters and the initial states of this model, we will use the `ets()` function:

```{r}
# calculate Holt Method
ets_AAN <- ets(trend_data_train, model="AAN")
ets_AAN
```

- Both smoothing parameters, $\alpha$ and $\beta$, are reported to be equal to 0.19.

In case you need these parameters, you can use `coef()` function:

```{r}
coef(ets_AAN)
```

Just like any other forecasting model constructed using the `ets()` function in R, the forecasts are produced by using the forecast function:

```{r}
forecast(ets_AAN, h=h)$mean
plot(forecast(ets_AAN, h=h))
```

### Damped Trend
If you have plotted the data, you may have noticed that the forecast produced by `ETS(A,A,N)` is biased (it overshoots the data in the holdout). This is a common thing in forecasting, so in order to address this issue a damped trend method has been proposed. The underlying statistical model is `ETS(A,Ad,N)`.

In order to fit the **Damped trend method**, the same syntax as before can be used, but with an additional parameter:

```{r}
# Calculate a Damped Holt Method
ets_AAdN <- ets(trend_data_train, model="AAN", damped=TRUE)
ets_AAdN
```

- a new parameter $\phi$, representing the damping parameter, which is equal to 0.98.

- Please note that the default argument for the `ets()` function is `damped=NULL`, meaning that if not specified, the function will have to choose between `damped = TRUE` and `damped = FALSE`, based on an Information Criterion.

The syntax for `es()` function is different, as it accepts “d” right in the name of the model the following way:
```{r}
es_AAdN <- es(trend_data, model="AAdN", h=h, holdout=TRUE)
forecast(es_AAdN, h=h)$mean
plot(forecast(es_AAdN, h=h))
```

## Notation for Exponential Smoothing Models

ETS:Error, Trend, Seasonality;

- for `N`: None,

- for `A` Additive,

- for `Ad` Additive damped,

- for `M` Multiplicative,

- for `Md` Multiplicative damped.

Examples:

- `ETS(A,N,N)` Additive error, no trend, no seasonality 

- `ETS(A,A,M)` Additive error, additive trend, multiplicative seasonality

## Fitting Exponential Smoothing Models (Trend and Season)

In this subsection, you will see how to calculate an Additive **Holt-Winters method**. Recall that this method involves the presence of both **trend** and **seasonality** in the series.

Copy the `trend seasonal data.csv` file into your Working directory, and plot the data in order to analyse it visually.

```{r}
trend_seasonal_data <- read.csv("./trend_seasonal_data.csv", col.names = FALSE)
trend_seasonal_data <- ts(trend_seasonal_data, frequency = 12)
plot(trend_seasonal_data)
```

- The plot indicates that there are **both trend and seasonality** in the data. This means that a Holt-Winters method can be used.

Split the data into two sets with 36 and 12 observations:

```{r}
# find the total number of observations
trend_seasonal_data_length <- length(trend_seasonal_data)
# write down size of training set
train_length <- 36
# and the forecasting horizon
h <- 12
# create the training set
trend_seasonal_data_train <- ts(trend_seasonal_data[1:train_length], frequency=12)
# create the test set
trend_seasonal_data_test <- trend_seasonal_data[(train_length+1):trend_seasonal_data_length]
```

The additive Holt-Winters method has an underlying `ETS(A,A,A)` model. There are other types of trend-seasonal models in ETS: `ETS(A,Ad,A)`, `ETS(A,A,M)`, `ETS(M,A,M)` and so on. Constructing a seasonal model is straightforward:

```{r}
# Fit a model using ets():
ets_AAA <- ets(trend_seasonal_data_train, model="AAA", damped=FALSE)
ets_AAA
```

```{r}
# Do the same using es():
es_AAA <- es(trend_seasonal_data_train, model="AAA", h=h)
es_AAA
```

- If you look at the output of these function, you will notice that there is a third smoothing parameter now, called $\gamma$ (used for seasonal part of the model) and that now there is a set of initial seasonal components. 

- Overall `ETS(A,A,A)` has 18 parameters (3 smoothing parameters, 12 initial seasonal indices, 2 initials for level and trend and an estimate of standard deviation of the error).

### Selecting the best model based on optimisation

So far, you have seen how to fit specific models using `ets()` and `es()` functions in R. Both of them also allow selecting the most appropriate model for the data. This is done via fitting a set of them and selecting the one that has a minimal **Information Criterion**. This can either be done using `AICc`, `AIC` or `BIC`. By default both `ets()` and `es()` use `AICc`.

In order to use this feature, you need to set model argument equal to `"ZZZ"`:

```{r}
# Calculate an Optimized ETS Method using ets()
ets_ZZZ <- ets(trend_seasonal_data_train, model="ZZZ")
ets_ZZZ
```

```{r}
# Do the same using es()
es_ZZZ <- es(trend_seasonal_data_train, model="ZZZ")
es_ZZZ
```

- By setting the model argument equal to `"ZZZ"`, you ask function to explore many models and return the optimal one. In our case, the optimal model for the data is ETS(A,Md,A).

If you are sure that some components have a specific type in your data, you can pass them in the function. For example, we can be certain that there is no seasonality in the trend data, that we used in previous sections. So we can ask to exclude different types of seasonal models from the set of models (this should speed up the calculations and will result in a non-seasonal ETS model.):

```{r}
# Select the most appropriate non-seasonal model with ets()
ets_ZZN <- ets(trend_data_train, model="ZZN")
ets_ZZN
```

```{r}
# The same thing with es()
es_ZZN <- es(trend_data_train, model="ZZN", silent="a")
es_ZZN
```

## Rolling origin

So far, we have produced forecasts from one and the same point in time, from the same origin. The technique that we used is called “static origin”. The idea behind “rolling origin” is to produce several forecasts by moving the origin of the forecast.

![](./Rolling Origin Evaluation.png)

This is an iterative procedure and on each iteration the **train set size is increased by one** observation, while the **test set size is decreased by one**.

Let’s try this technique on the example of “medium_noise” data:

```{r}
# Load the Data
medium_noise <- read.csv("./medium_noise.csv", header=FALSE)
# Convert to Time Series
medium_noise <- ts(medium_noise, frequency=12, start=c(2012,1))
plot(medium_noise)
```

We will produce ten forecasts using ETS(A,N,N) model with the **horizon of 12**. First, write down the length of the series (48). Now we need to figure out the **lengths of train and test sets**. In order to do that you can use the following formulae:

$$T_0=T-h-n+1$$

$$T_1=h+n-1$$

where $T_0$ is the size of train set, $T_1$ is the size of test set, $h$ is forecasting horizon and $n$ is number of origins.

Now let us set training and test sets, and we do not need to preserve the time series structure of the test set, so we can simply save it as a vector:


```{r}
# Set horizon and number of rolling origins
h <- 12
origins <- 10
medium_noise_length <- length(medium_noise)
train_length <- medium_noise_length - h - origins + 1
test_length <- h + origins - 1
medium_noise_train <- ts(medium_noise[1:train_length],
                         frequency = frequency(medium_noise),
                         start = start(medium_noise))
medium_noise_test <- medium_noise[(train_length+1):medium_noise_length]
```

Now that we managed to split the data into two parts, we will prepare **matrix** for forecasts and for the holdout values, so that we can easily calculate any error we want. In order to remember, what is what, we will give names to the rows and columns:

```{r}
medium_noise_forecasts <- matrix(NA, nrow=origins, ncol=h)
medium_noise_holdout <- matrix(NA, nrow=origins, ncol=h)
colnames(medium_noise_forecasts) <- paste0("horizon",c(1:h))
rownames(medium_noise_forecasts) <- paste0("origin",c(1:origins))
dimnames(medium_noise_holdout) <- dimnames(medium_noise_forecasts)
```

Finally we can start producing forecasts using rolling origin. We will need a loop for that:

```{r}
for(i in 1:origins){
# Create a ts object out of the medium_noise data
    our_train_set <- ts(medium_noise[1:(train_length+i-1)],
                        frequency=frequency(medium_noise),
                        start=start(medium_noise))
# Write down the holdout values from the test set
    medium_noise_holdout[i,] <- medium_noise_test[i-1+(1:h)]
# Produce forecasts and write them down
    medium_noise_forecasts[i,] <- forecast(ets(our_train_set,"ANN"),h=h)$mean
}
```

After that we will have two matrices with forecasts and holdouts. You can then work with them, calculating different types of errors. For example, **MAE** for each horizon can be calculated following way:

```{r}
colMeans(abs(medium_noise_holdout - medium_noise_forecasts))
```

Now we have these two matrices and can calculate different errors and analyse how our fore- casting methods perform on different horizons. We can also use other statistical methods you may know (calculate mean, median, quantiles, standard deviation etc) for that.


